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Summary 



One statistically meaningful technique to estimate the unknown quantum 
state based on a set of informationally complete measurement data is the 
maximum- likelihood method (ML). This technique yields a unique ML 
estimator for a given complete set of data. An iterative algorithm was 
proposed by Jaroslav Rehacek et al. to search for a positive estimator 
that maximizes the likelihood functional. We first show that this algorithm 
coincides with the steepest-ascent technique and develop a new algorithm 
based on the conjugate- gradient method that can be more efficient than the 
steepest-ascent version. We inspect the performance of this new algorithm 
with Monte Carlo numerical simulations. 

In general, however, the measurement data obtained from complex 
quantum systems are informationally incomplete and, as a rule, do not yield 
a unique state estimator. We establish an estimation scheme where both the 
likelihood and the von Neumann entropy functionals are maximized in order 
to systematically select the most-likely estimator with the largest entropy, 
that is, the least-bias maximum-likelihood and maximum-entropy estimator 
(MLME), consistent with a given set of measurement data. This is equivalent 
to the joint consideration of our partial knowledge and of our ignorance about 
the source to reconstruct its identity. The MLME technique is then applied 
to both experimental and simulation data. 

Next, we take a look at a recent proposal by R. Blume-Kohout - - the 
hedged maximum-likelihood method — for quantum state estimation and 
derive an iterative scheme (HML) to look for the estimator that maximizes 
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the hedged likelihood functional. We then report some interesting features 
of these HML estimators in the context of informationally incomplete mea- 
surements and compare them with the MLME estimators using numerical 
simulations. 

Entanglement detection via witness measurements is a useful technique 
to check if an unknown quantum state is an entangled one. The MLME 
algorithm can also be used to increase the efficiency of entanglement detec- 
tion, using the data obtained from measuring sets of witness bases. This is 
better than the conventional witness measurement strategy in which only the 
expectation value of each witness is estimated and used to infer the existence 
of entanglement in the unknown quantum state. In our proposed strategies, 
all information from the collected data is used to detect entanglement and 
when this fails, state estimation can be performed to estimate the unknown 
state. Adaptive strategies to measure these witness bases will also be 
presented. 

Finally, we also propose a similar algorithm, as in quantum state esti- 
mation, for incomplete quantum process estimation based on the combined 
principles of maximum-likelihood and maximum-entropy, to yield a unique 
estimator for an unknown quantum process when one has a set of informa- 
tionally incomplete data. We apply this iterative algorithm adaptively to 
various situations in order to minimize the amount of measurement resources 
required to estimate the unknown quantum process with incomplete data. 
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1.1 Table of the average number of iterations and average duration 
to complete one full run of the respective iterative schemes for 
the four-qubit GHZ and W states. The POM for the sim- 
ulations consists of the tensor products of four single-qubit 
tetrahedron outcomes. The above illustrates that on average, 
ML-CG I, which is ML-CG with fixed e^, performs better, in 
terms of the average duration of a full run, than the regular di- 
rect and conjugate-gradient schemes with optimization, even 
though the average number of iterations can sometimes be sig- 
nificantly reduced using the optimized schemes. The additional 
time taken for the type II algorithms is mainly due to the heavy 
matrix evaluations in the line search procedure 32 



2.1 Signatures of the relevant projectors in witness basis measure- 
ment set-up for polarization qubits (0=v, 1=h), detected by the 
set-up of Fig. 2.1, with no wave plates in the input ports. As a 
consequence of the Hong-Ou-Mandel effect [HOM87] (Chung 
Ki Hong, Zhe-Yu Ou and Leonard Mandel), the cases (1,0,0,1) 
and (0,1,1,0) do not occur 98 



2.2 The six witness bases of the kind depicted in Fig. 2.1 that enable 
full tomography of the two-qubit state. The second and third 
columns list the unitary operators U™ p and that describe 
the effect of the wave plates WPs 1 and WPs 2, respectively, on 
the polarization of the incoming photons. The fourth column 
states the three two-qubit operators whose expectation values 
are determined when the eigenstate basis of the corresponding 
witness is measured 99 
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2.3 The results of local unitary equivalences for sets in Classes 2 
to 6. Class 1 contains only three sets which are mutually re- 
lated by the qubit Clifford transformation that permutes the 
qubit Weyl operators. The value under the column "l-Vi", 
for instance, gives the number of l-Vi transformations that are 
performed on a fixed reference set in each of the families that 
falls in the class. For example, the first row says that for each 
family out of 17 in Class 2, including the reference set, there 
exists a total of 16 sets with 15 of them generated by apply- 
ing various types of n-V\ transformations on the reference set 
in the family. Families with the configuration (4,6,4,1,0,0), for 
instance, are due to the fact that two witness bases in the ref- 
erence set of every family, having the same u\,U2 settings, are 
unaffected by the V\ transformations and so there are (j) =4 1- 
V\ transformations, (2) = 6 2-V\ transformations, (3) = 4 3-Vi 
transformations and (^) = 1 4-Vi transformations. Every fam- 
ily in Class 4 has half of the 32 sets equivalent to the other half 
via an overall V\ transformation on the entire set. For instance, 
sets that are generated by the \-V\ and 5-Vi transformations 
on the reference set in a particular family are related via an 
overall V\ transformation and so on. Half the set generated by 
the 3-Vi transformations on the reference set is equivalent to 
the other half generated by the same type of transformations in 
the same manner. The entries under the last column includes 
the reference set in each family. There are 1392 informationally 
complete sets of witness bases out of these five classes. Together 
with the three sets in Class 3, there is a total of 1395 sets. . . . 106 
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1.1 Single-qubit state simulated with 10 3 detection copies over 100 
experimental runs. We analyze the performance of ML-CG (■) 
in terms of the average number of iterations over the experi- 
mental runs. Here the precision e is set to 10 -7 . In general, 
lower £ values can further boost the performance of both schemes. 29 



1.2 A total of 1500 random two-qubit full-rank mixed state were 
simulated with eight thousand detection copies over fifty exper- 
imental runs. By fixing the precision e = 10 -7 , the scatter plots 
of the average number of iterative steps over the experimental 
runs for ML-DG with fixed e k = e (ML-DG I) (Red), ML-DG 
with optimized e k (ML-DG II) (Blue) and ML-CG (Green) in- 
dicate an expected trend. For all the randomly chosen states, 
ML-CG outperforms ML-DG II with an average improvement 
of about 55%. On average, ML-CG requires about 95% less 
number of iterative steps than ML-DG I for the same precision. 31 



1.3 Here is the corresponding plot of the average duration of one 
complete run of each of the three schemes: ML-DG I(Red), 
ML-DG II (Blue) and ML-CG (Green). The average improve- 
ment on which ML-CG outperforms ML-DG II, in terms of 
the average duration of one complete run, is about 65%. The 
corresponding improvement of ML-CG over ML-DG I is about 
75% 31 
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1.4 Two-qubit tomography using joint trine POMs consisting of 
nine outcomes. A Monte Carlo simulation is performed with 
the number of detection copies N = 10 6 on a random true 
state described by a real statistical operator. The vertical 
axis represents the real matrix elements for both the true 
and reconstructed statistical operators in the computational 
basis. The horizontal axes respectively represents the row 
and column labels of the matrices. The trace-class distance 

£>tr = tr{|/5 M LME - Ptrue|}/2 IS 0.158 41 



1.5 Two-qubit tomography using a random two-qubit POM con- 
sisting of nine full-rank outcomes. A Monte Carlo simulation 
is performed with N = 10 6 on a random true state represented 
by a complex positive matrix of unit trace. The vertical axis 
in each of (a) and (b) represents the real matrix elements of 
the respective true and reconstructed statistical operators in 
some computational basis and that in each of (c) and (d) rep- 
resents the respective imaginary matrix elements. In this case 
V tI = 0.206 42 



1.6 A simulated experiment on a random state, in which 5000 
qubits were measured using a random imperfect two-outcome 
POM. The plot markers, which are indicated by dots, represent 
the entropies of the MLME estimators generated by the naive 
scheme starting from random states in the uniform distribution 
with respect to the Hilbert-Schmidt measure. 10 3 such estima- 
tors were computed. The thick solid line represents the entropy 
of the MLME estimator generated by Scheme B 49 
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1.7 A comparison of two different schemes for a fixed random in- 
complete POM with 10 3 random qubit true states distributed 
uniformly with respect to the Hilbert-Schmidt measure. Fifty 
experiments were simulated for every true state, with N = 5000 
for each experiment, and the average trace-class distance T)\ ^ g 
was computed. The entire simulation was done with a set of 
randomly generated, informationally incomplete POM consist- 
ing of three outcomes. A POM outcome was discarded to sim- 
ulate the situation in which two functioning detectors out of 
the three are registering the qubits. The plot markers denoted 
by "+" represent reconstructed states using Scheme A, and 
those denoted by "□" represent the reconstructed states using 
Scheme B. The missing probabilities estimated by the recon- 
structed states using the Scheme B are typically closer to the 
missing frequencies that would be measured if the discarded 
detector was functioning compared to those estimated by the 
reconstructed states using Scheme A. About 80% of the total 
number of true states respond better under the second scheme. 50 



1.8 Schematic diagrams of Z(A, p) on the space of statistical oper- 
ators. The maximally-mixed state resides at the center of the 
square base which represents the Hilbert space. At the extremal 
points of A, T{X = 0; p) = log(C({rij}; p))/N, with a convex 
plateau at the maximal value, and Z(A — > oo; p) = XS(p). Plot 
(c) shows the functional with an appropriate choice of value for 
A for MLME. An additional hill-like structure resulting from 
S(p) is introduced over the plateau, so that the estimator with 
the largest entropy can be selected from the convex set of ML 
estimators within the plateau 53 
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1.9 A simulation on quantum tomography on a randomly generated 
mixed state of light in the five-dimensional Fock space. In this 
plot, the number of copies of quantum systems measured is 
fixed at N = 10 4 . A choice of 20 quadrature eigenstates made 
up of four different t? settings, with five x values corresponding 
to each setting, which are projected onto this space was used 
and state estimators are constructed for different values of A. 
As A decreases, both the entropy and likelihood functionals 
approach their respective optimal values obtained from MLME 
(i.e. when A — > 0). When A is zero, there is a convex set of 
estimators giving the optimal likelihood value. For very large 
A values, the estimators approach the maximally-mixed state 
and hence S(p) approaches the maximal value log 5 60 



1.10 A simulation on quantum tomography on a randomly generated 
mixed state ptrue of light in the 20-dimensional Fock space with 
a slightly positive Woo = 0.141 . D tr and Woo respectively de- 
note the trace-class distance between the reconstructed estima- 
tor and the true state and the Wigner functional at the phase 
space origin, both averaged over 50 experiments with N = 10 4 . 
The same set of 20 quadrature eigenstates as in Fig. 1.9, 
projected onto this space was used and this set of measure- 
ments is informationally complete in the two-, three-, and four- 
dimensional Fock subspaces (shaded region). The values Woo 
and Ar were obtained by ML [SMBF93, OTBG06, NNNH+06] 
in subspaces of dimensions two to four, and by the MLME 
scheme in dimensions greater than four. The plot shows a 
strong dependence of Woo and D tr on the subspace dimension. 
In this case, it is obvious that the negativity of Woo inferred 
by a reconstruction in a subspace too small is just an artifact 
of the truncations. Also, Dt T decreases as the reconstruction 
subspace increases in dimension. This demonstrates the ad- 
vantages of the MLME scheme over the ML method 63 



1.11 A schematic diagram representing the time- multiplexed setup 
with K + 1 output ports. The Tjs are the respective trans- 
mission probabilities for the j'th beam splitter. The overall 
efficiency for, say, the kth. port is given by fjk = %(1 — Tfc + 
Tx+iSk,K+i) nf=i T j 65 
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1.12 Density plots of the Wigner functions, in phase space, of vari- 
ous statistical operators for (a) the true state (20-dimensional 
stationary state of a laser, \i = 4) with f « 0.394, (b) the 
5- dimensional ML estimator with f ~ 0.921 and (c) the 11- 
dimensional MLME estimator with f ~ 0.489. Here, brighter 
regions indicate the locations of larger Wigner function values, 
and vice versa. The statistical operator for (b) is obtained us- 
ing ML by assuming a 5-dimensional subspace in which the 
displaced POM outcomes are informationally complete. The 
statistical operator for (c) is obtained by assuming a larger 
subspace of dimension 11 using MLME. Numerous artificial 
non-classical features of the ML estimator, a signature of its 
highly oscillatory Wigner function, are manifested as an abnor- 
mally large value of f , an inevitable byproduct of state-space 
truncation. One can see that with MLME, extraneous artifacts 
of the Wigner function resulted from such a truncation can be 
largely removed 68 



1.13 Density plots of the Wigner functions, in phase space, of various 
statistical operators for (a) the true state (p a >, a' = 5), (b) the 
8-dimensional ML estimator, (c) the 10-dimensional and (d) 
1 5-dimensional MLME estimators. In this case, the Wigner 
function of the ML estimator differs greatly from that of the 
true state, an example of misleading information obtained via 
state-space truncation. A transition in the structure of the 
Wigner function occurs at -D S ub = 10, with the MLME estima- 
tor for -D su b = 15 giving a more accurate estimated picture of 
the Wigner function of the true state 69 



1.14 Schematic diagram of the diffraction patterns of an incoming 
light beam that is obtained from a SH wave front sensor. The 
light beam is transformed by an array of microlenses (aper- 
tures). A CCD camera is placed at the rear focal plane of the 
array. The measurement data consist of the measured inten- 
sities of the beam. The intensity at the jth pixel, located at 
position Xj, behind the kth microlens aperture is denoted by 
h{xj) 72 
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1.15 Experimental set-up involving a single-mode fiber (SMF), a 
spatial light modulator (SLM), an aperture stop (A) and a 
Shack-Hartmann (SH) sensor 77 



1.16 CCD image for the state p*^ 6 - The relevant part of the SH 
readout used for the beam reconstruction is shown. Contribu- 
tions from the individual SH apertures are indicated by bright 
spots, with each spot made up of multiple pixels. Note that 
the two void regions correspond to the phase singularities of 



the state p^l- This hints that « p s c H 80 

1.17 MLME state estimation from informationally incomplete data 
for -D su b = 9. The real (left) and imaginary (right) parts of 
the reconstructed coherence operator /5^ ME are shown. The 



reconstruction subspace is spanned by the modes LG;, with 
I = 0,1, ... ,8. In this case, 56 out of 91 independent outcomes, 
required for complete characterization of p*^ 6 , are not acces- 
sible, yet the MLME estimator p^ ME * s c l° se to p^oh' with a 
fidelity of 92% 81 

1.18 Average fidelities, computed over 50 random choices of compu- 
tational bases, of the estimators for different dimensions D S ub of 
the reconstruction subspace. The unfilled (filled) circular plot 
markers correspond to informationally complete (incomplete) 
tomography, respectively 83 

1.19 A numerical comparison between HML and MLME. A total of 
500 random true states ptrue are generated for each POM. For 
every true state, a total of 100 experiments for a fixed N = 500 
were simulated and the average trace-class distance V^ s was 
plotted. In each plot, for almost all the random states, the 
estimators Phml (+) and Pmlme (C) almost coincide on average. 90 
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2.1 A linear-optics set-up that offers an experimental implementa- 
tion of the optimal witness of Eq. (2.1.4) for polarization qubits. 
Two photons that are indistinguishable by their spatial-spectral 
properties are simultaneously incident on a half-transparent 
mirror, photon 1 from the left and photon 2 from the right. 
They carry one polarization qubit each, with their unknown 
two-qubit state to be analyzed. After being transmitted 
through, or reflected off, the half-transparent mirror, the pho- 
tons are detected behind polarizing beam splitters that reflect 
vertically polarized photons and transmit horizontally polarized 
ones. The four detectors LH, LV, RV, and RH must be able to 
discriminate between one- photon and two-photon events. The 
four eigenstates of the family of entanglement witnesses are dis- 
tinguished by different signatures; see Table 2.1. By letting the 
photons pass through polarization changing wave plates in the 
input ports, labeled by WPs 1 and WPs2, one realizes other 
witnesses that differ from the witness of Eq. (2.1.4) by local 
unitary transformations 97 

2.2 A simulation on the measurement of the set of six informa- 
tionally complete two-qubit entanglement witness bases for 10 
random two-qubit pure states, as well as full-rank mixed states, 
with the measurements done for one state at a time 109 
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3.1 Numerical simulations on the two-qubit (d = 2 2 ) and three- 
qubit (d = 2 3 ) quantum channels where D\ = D Q = d. The 
projectors of symmetric informationally complete POMs (SIC 
POMs) are chosen as the linearly independent input states for 
all the simulations (L = d 2 ). For the measurements, informa- 
tionally complete POMs consisting of tensor products of qubit 
SIC POMs are used (M = d 2 ). Each qubit SIC POM consists 
of a set of pure states whose Bloch vectors form the "legs of a 
tetrahedron" in the Bloch sphere. For the two-qubit channels, 
N = 10 4 and an average over 50 experiments is taken to com- 
pute the trace-class distances. For the three-qubit channel, the 
measurement data are generated without statistical noise. For 
unitary channels, one can see that the MLME algorithm can 
still give fairly accurate estimations with a smaller number of 
input states than that of a linearly independent set. Numeri- 
cal simulations of arbitrary two-qubit and three-qubit unitary 
channels suggest that the number is approximately d 2 /2 for 
SIC POM input states, above which there is insignificant to- 
mographic improvement 124 



3.2 A comparison of three incomplete QPT schemes: the non- 
adaptive MLME scheme, the adaptive MLME scheme and the 
adaptive MPL-MLME scheme. Monte Carlo simulations are 
carried out on two different types of imperfect cnot gates de- 
scribed in the text. Here, iV = 10 4 and an average over 50 
experiments is taken to compute the trace-class distances. For 
both the non-adaptive as well as the adaptive MLME schemes, 
the 16 linearly independent input states are chosen to be tensor 
products of projectors of the kets |0), |1), (|0) + |1})/v2 and 
(|0) + 1 1) i)/ v2- For all schemes, the POM outcomes are chosen 
to be the tensor products of qubit SIC POMs. The tomographic 
performance of the adaptive MPL-MLME scheme is the best 
out of the three. The plots show that the tomographic effi- 
ciency can be further improved by optimizing the input states 
over the Hilbert space instead of restricting to a fixed set of 
linearly independent input states, albeit the small difference in 
tomographic performance between the two adaptive schemes 
for some quantum processes 134 
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3.3 The dependence of the size of the likelihood plateau (A) and 
the normalized log-likelihood maximum on the number of in- 
put states. The respective performances of the non-adaptive 
MLME scheme, the adaptive MLME scheme and the adap- 
tive MPL-MLME scheme are computed based on noiseless 
measurement data for an imperfect cnot gate with e = 0.1. 
For both the non-adaptive MLME scheme and the adaptive 
MLME scheme, the 16 linearly independent input states are 
chosen to be tensor products of projectors of the kets |0), |1), 
(|0) + \l))/y/2 and (|0) + |1) i)/V2. For all schemes, the POM 
outcomes are chosen to be the tensor products of qubit SIC 
POMs. From the plot, the rate of decrease of A is the great- 
est with the adaptive MPL-MLME scheme. The increase in 
the normalized log-likelihood maxima with the adaptive MPL- 
MLME scheme may also be interpreted as greater maximum 
information gain after measurements using the optimal input 
states as compared to the other schemes 136 



3.4 A comparison of three incomplete QPT schemes: the non- 
adaptive MLME scheme, the adaptive MLME scheme and a 
combination of the adaptive MPL-MLME scheme and the adap- 
tive MLME scheme (hybrid scheme). Monte Carlo simulations 
are carried out on the imperfect cnot gate with e = 0.1. 
Here, N = 10 4 and an average over 50 experiments is taken to 
compute the trace-class distances. For both the non-adaptive 
as well as the adaptive MLME schemes, the default set of 
16 linearly independent input states are chosen to be tensor 
products of projectors of the kets |0), |1), (|0) + \l))/y/2 and 
(|0) + |1) i)/v2- For all schemes, a set of 16 randomly gener- 
ated positive operators, which are all linearly independent of 
one another, are used to form the POM. For this POM, the av- 
erage repetition frequency of the adaptive MPL-MLME scheme 
is very high after four input states are used. The first input 
state for all schemes is chosen to be the same separable state 
pp = 1 00) (00|. For the third scheme, the second to the fourth 
input states (shaded region) are optimized using the adaptive 
MPL-MLME strategy and the subsequent input states are cho- 
sen via the adaptive MLME strategy using the default set of 
input states which excludes 1 00) (00 1. The plot shows that the 
overall performance of the combined strategy is better than the 
adaptive MLME strategy alone 138 
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3.5 Numerical simulation on the imperfect two-qubit cnot gate 
with random noise for fixed LN = 10 . An average over 50 
experiments is taken to compute the trace-class distances. The 
adaptive MPL-MLME strategy is used when the number of 
input states L is less than 16 140 
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Chapter 1 

Quantum State Estimation 



r- SECTION 1.1 

Introduction 



Quantum state preparation is the first important step for any protocol that 
makes use of quantum resources. Examples of such protocols are quantum 
state teleportation and quantum key distribution which require entangled 
quantum states. In order to verify the integrity of the quantum state pre- 
pared by the source, one carries out quantum state tomography on the source. 
Measurements are performed on a collection of identical copies of quantum 
systems (electrons, photons, etc.) that are emitted from the source. Then, 
the quantum state of the source is inferred from the measurement data ob- 
tained from this collection. The measurements are generically described by 
a set of positive operators IX,- that compose a probability operator measure- 
ment (POM). After that, the measurement data obtained are used to infer 
the quantum state of the source. Such a procedure of state inference, which 
shall be our main focus in this dissertation, is also known as quantum state 
estimation. 

The central idea of quantum state estimation is to attribute a well-defined 
objective true state to each measured quantum system that is emitted from 
the source, making a connection with the frequentist's definition of classical 
estimation. An observer, after measuring a finite number of copies, will obtain 
a state estimator that is generally different from that obtained by another ob- 
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server, after measuring his own copies in a different way. This is not surprising 
since the quantum state of the source directly reflects the amount of informa- 
tion an observer gains after measuring his copies [CFS02]. As the number of 
copies approaches infinity, different estimation procedures ultimately lead to 
the same true quantum state of the source if the measurements completely 
characterize the source. However, such an idealized situation is never achiev- 
able in any laboratory setting, as one can only perform measurements on finite 
copies of quantum systems. As a result, the state estimator obtained will be 
different from the true state and depends on the details of the estimation 
procedure. To make statistical predictions, the corresponding operator p de- 
scribing this estimator must be a statistical operator, which is positive. This 
will ensure that the estimated probability pj = trjpIL/} for an outcome Uj of 
any set of POM is positive. We shall denote all estimated quantities with a 
"hat" symbol. 

The frequentist's notion of quantum state estimation, described above, is 
fundamentally different from the Bayesian point of view [PR04, CFS02], in 
which there is no objective true state of the source to be characterized. Rather, 
the quantum state of a given source is treated purely as knowledge that is to 
be updated by the measurement data obtained from finite copies, subjected to 
some prior information about the distribution of statistical operators. In the 
latter viewpoint, the quantum state of the source is naturally regarded as a 
subjective reality that is based on the measurements performed by an observer, 
rather than a definite state that is associated to the source. Unfortunately, due 
to its technical difficulty, a feasible Bayesian estimation scheme for quantum 
states is presently undeveloped. 

There are two popular methods for the frequentist's version of quantum 
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state estimation: Bayesian state estimation* and maximum-likelihood estima- 
tion (ML). The Bayesian state estimation method [SBC01, BKH06, BKlOb] 
constructs a state estimator from an integral average over all possible quan- 
tum states to estimate the unknown true state. The likelihood functional, 
which yields the likelihood of obtaining a particular sequence of measurement 
detections given a quantum state, serves as a weight for the average. This ap- 
proach includes all the neighboring states near the maximum of the likelihood 
functional as possible guesses for the unknown ptme- These neighboring states 
are given especially significant weight when N, the measured total number of 
copies, is small, in which case the likelihood functional is only broadly peaked 
at the maximum. However, the integral average unavoidably depends on how 
one measures volumes in the state space, and there is no universal and unam- 
biguous method for that. The ML method [Fis22, Hel76, PR04, RHKL07], on 
the other hand, simply chooses the estimator as the statistical operator that 
maximizes the likelihood functional. For a sufficiently large number of copies, 
both methods give the same estimator since the likelihood functional peaks 
very strongly at the maximum. 

When the measurement outcomes form an informationally complete set, 
the measurement data obtained will contain maximal information about the 
source. Thus, a unique state estimator can be inferred with ML. Unfortu- 
nately, in tomography experiments performed on complex quantum systems 
with many degrees of freedom, it is not possible to implement such an informa- 
tionally complete set of measurement outcomes. As a result, some information 
about the source will be missing and its quantum state cannot be completely 
characterized. For example, if a source produces a mode of light that is de- 
scribed by an infinite-dimensional statistical operator ptruej then no matter 

*Not to be confused with the Bayesian view of quantum estimation as discussed previ- 
ously. 
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how ingeniously a measurement scheme is designed to probe incoming pho- 
tons prepared by this source, an infinite amount of information about the 
mode of light will always remain unknown. The ML estimator obtained from 
these informationally incomplete data is no longer unique and there will in 
general be infinitely many other ML estimators that are consistent with the 
data. 

The standard approach to this problem is to apply an ad hoc truncation 
on the HUbert space and perform the state reconstruction in a particular sub- 
space. This results in a smaller number of unknown parameters that can then 
be uniquely determined by the measurement scheme. Since the truncation is 
largely based on the observer's intuition about the expected result, that is 
the true state that describes an infinite number of copies of such quantum 
systems, this cannot be a truly objective method [RMH08]. A more objective 
alternative is to consider the largest possible reconstruction subspace that is 
compatible with any existing prior knowledge about the source. For example, 
if an observer has prior knowledge about the range of the energy spectrum 
a given light source can have, he should consider the largest possible recon- 
struction subspace that contains quantum states describing the source in this 
range of energies. This inevitably introduces more unknown parameters that 
cannot be uniquely determined by the measurements and one should select 
the state estimator in this subspace that is least biased. 

In Refs. [TZE + 11] and [TSE + 12], we reported an iterative algorithm 
(MLME) to estimate unknown quantum states from incomplete measurement 
data by maximizing the likelihood and von Neumann entropy functionals. The 
application of this algorithm was illustrated with simulations and experimen- 
tal data and we concluded that, together with a more objective Hilbert space 
truncation, this approach can serve as a reliable and statistically meaningful 
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quantum state estimation with incomplete data. 

In this first chapter, we will discuss, at great lengths, the principles of 
quantum state estimation and establish some novel algorithms using various 
numerical methods. 

i- SECTION 1.2 1 

Preliminaries of quantum state estimation 



1.2.1 Estimation theory 

At the heart of estimation theory lies the principles of functional optimiza- 
tion [Hel76]. Typically, an objective functional involving the cost functional 
£(ptrue> of an estimator p for the unknown quantum state ptrue of a source 
is minimized based on the measurement data D. These measurement data are 
collected in an experiment carried out on the unknown source producing mul- 
tiple copies N of quantum systems, each prepared in the state ptrue- The data 
collection is usually done with a probability operator measurement (POM) 
such that J2j n? = 1 • 

Since ptmc is always unknown, in order to obtain a generically reliable 
estimator, the objective functional to be minimized has to be independent of 
P = Ptrue- There are many kinds of such objective functionals we can use. 
A typical kind of objective functional, which we will consider here as the 
main example, is one that accounts for all possible experimental data D one 
can obtain in an experiment. This allows us to find the estimator that is, 
in this sense, a universally optimal estimator for the cost functional that is 
independent of the data. To this end, we introduce the average cost functional 
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where (dr D ) is a pre-chosen integration measure for the D-dimensional Hilbert 
space and p(35 n p) is the probability of having 35 and the state p simulta- 
neously. The summation notation refers to an average over all possible 
35. The statistical identity p(D PI p) = p(D\p)ir(p) separates p(D n p) into 
a product of a conditional probability distribution and a prior probability 
distribution -ir(p) of all possible states p. The conditional probability p(35|p), 
which involves the data, is defined in terms of the likelihood functional £(35; p) 
inasmuch as 

p(p\p)= . . , . . (i.2.2) 

n /(dr D )7r(p)£(2);p) V ; 

The functional £(35; p) gives the likelihood of a state p yielding the measure- 
ment data 35. The prior probability distribution vr(p), on the other hand, 
reflects the prior knowledge one has about the source. One can define the 
prior (dp) = (dr D )7r(p). After inserting all the necessary elements, the objec- 
tive functional is given by 

C(P) = £^«^. (1.2.3) 
V J (dp )£(35; p 1 ) 

To proceed, we need to decide on the form of €(p,p), for the estimator p 
strongly depends on the cost functional. A very typical functional 

ftM .!M^W sll ( „ 4) 

2||©||2 

defined by the positive operators and S), can be used as the cost functional 
and this quantifies a "distance" between p and p. Here ||<5||2 refers to the 
operator 2-norm of defined as 

„ „ \/(v\ &® \v) 

g 2 = max VW W/ . 1.2.5 
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This is equal to the largest eigenvalue of (5 > 0, since for any ket \y), 



V(y\® 2 \y) 



tr<^ © 2 



\v) (y\ 
(y\y) 



V 3 

— flmax 



\y) 



/£i<* 



(1.2.6) 



In the derivation, the fact that < © = J2j 9j is exploited. 



To show that <£i(p,p) is indeed bounded from above by 1, we note that 



tr 



tr< 



<tr< 



< 



P + S) 
1 + tr{£} ' 

P + Sj 
1 + tr{£} 

P + F) 
1 + tr{£} 

p + f) 
1 + tr{£} 



tr< 



l + tr{£}y 



p)<3 

2 ' 



1 + tr{i}} 



p\ }tvi 



Q 

l+tr{fl} F 



. tr {(rfSfo-' i ) 2 } 



+ tr{p 2 } 



161 



<2||®||2- 



(1.2.7) 



In establishing the first inequality, the simple identity tr{p<5} < 
largest eigenvalue of © = \\<&\\2 is used. This general quadratic form £i(p, p) 
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has a unique minimum as long as <& > 0. Such a functional gives non-zero cost 
for p 7^ p and the special case = 1, = yields the familiar square of the 
normalized Hilbert-Schmidt distance (David Hilbert and Erhard Schmidt). 
An extreme case of such a cost functional is given by 



C 2 (p,p) = -6(p-p), (1.2.8) 

with which a singularly large reduction in cost is offered when p = p and no 
reduction is given otherwise. 

With <t(p,p) = <ti(p,p), the variation &£i(p, p) is 



tr 



{bp 



5C l( p, p) = ~ - ■ i g (^n^>) g J} . (1.2.9) 

2 © 2 



The total variation S£i(p) works out to be 

f(dp)£(V;p) 



5 ^) = -iE tr \ 8 p 



® I l+ttfs}) + {l+tr{f,} 1 15 



2||0|| 2 ^ ] " f(df/)£(P;f/) 



l — W5p [©(PB - P) + (/5b - p)&]} , 



2110112 , 



where 



feW) = 7W)W)/ (dp>£(S;rt T^' (L2 ' 10) 

Since minimizing £i requires that &<£i(p) = 0, we thus have p = Pb($))- 
The statistical operator pb(Sj) is known as the Bayesian estimator (Thomas 
Bayes) of ptrue for a given operator fj. A common variant of the Bayesian 
estimator [SBC01, BKH06, BKlOb] is defined as pb = Pb(0). In general, 
the integral average strongly depends on the definition of (dp), which has no 
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definite form whatsoever even when some constraints are imposed on (dr D ). 
For example, when D = 2 and spherical coordinates (r, $, (p) are used to 
parameterize the Bloch vector of p, the constraint of unitary invariance on 
(dp) fixes (dr2) = dJ7dr = d(/?di?sini?dr, but ir(p) = ir(r) can still take any 
function of the variable r. In this sense, there is an element of arbitrariness 
in the choice of (dp). Moreover, for a fixed form of (dp), the operator integral 
can be computationally difficult. 

A more straightforward estimation scheme would be to consider £(p, p) = 
^2(p,p)- The corresponding expression for ^(p) then simplifies to 



Thus, minimizing ^(p) amounts to looking for the estimator p = pml that 
maximizes the likelihood functional C(T>; p). This estimator is the maximum- 
likelihood (ML) estimator. In other words, to estimate ptmc whilst minimizing 
the objective functional ^(p) after an experiment, we need a scheme to search 
for a positive operator pml of unit trace such that the likelihood functional 
£(2); p) takes the largest value within the admissible space of quantum states 
p. There is an asymptotic connection between p& and pml- That is, when N 
is sufficiently large, the likelihood functional peaks very sharply around the 
maximum p = pml (£(2); p) &(p — Pml)) and, from Eq. (1.2.10), it follows 
that p B -> Pml- 

In a quantum-state tomography experiment, one can, in principle, measure 
N copies of quantum systems using detectors with perfect detection efficien- 
cies described by a POM ^ ■ IT; = 1, with j running over all detectors. The 
measurement data 35 = {rti, ni, 713, . . .} is a list of detection outcome occur- 
rences rij such that ^7 n i = N. One may also define the corresponding set 
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of measurement frequencies /,- = rij/N. For simplicity, we shall consider the 
POM to be informationally complete. This means that there are D 2 linearly 
independent outcomes in the POM that span the space of D-dimensional sta- 
tistical operators. Therefore, this type of POM fully characterizes the source 
and maximal information can be extracted from the measurement data to 
reconstruct ptrue uniquely. Since the detection of one copy is independent of 
another, the detection occurrences n,- follow a multinomial distribution and 
so the corresponding likelihood functional for this scenario is 

£({«*}; p) = II/'/ = j , (1-2-12) 

with pj = trj/jllj}. 

One can construct an operator that maximizes C({n,j};p) whilst paying 
no heed to the positivity constraint. To do this, we introduce a transposition 
mapping on a given operator ^ = |a)7(6| of complex a, b and 7 into an 
extended Hilbert space [Sco06]: 

* = |a)7(6|i-^|o)|6)7=|*>. (1.2.13) 

The notation PP^ denotes a superket. It is a ket that lives in an extended D 2 - 
dimensional Hilbert space and is derived from an operator in a D-dimensional 
Hilbert space. Analogously to operators, one can define a D 2 -dimensional su- 
peroperator |^^$| living in this extended Hilbert space. The simple identity 

=tr|V$} (1.2.14) 

follows from these notations. 

Under this formalism, we can systematically study the linear independence 
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of the POM outcomes. The first step is to note that for a set of iVo linearly- 
independent POM outcomes, if the equation 



El n ;K = 



(1.2.15) 



is to be satisfied for a given vector c = (ci, C2, . . • , cn ) t , then c must be zero 
since none of the outcomes can be expressed as a linear combination of the 
rest. In vector notations, Eq. (1.2.15) amounts to the scalar product relation 



n, , n, 



n 



Not 



C-2 



\CN J 



0. 



Defining the positive matrix 971 with matrix elements 



(1.2.16) 



m jk = (Uj\U k ) 



tT{UjU k } 



(1.2.17) 



The statement in (1.2.16) implies that the only solution to the matrix equation 
97t-c = is c = 0. In the language of linear algebra, we say that the null space 
of 97T has dimension zero. It follows that the rank of 97t is iVo- We have thus 
constructed a positive matrix 97T that has Nq positive eigenvalues out of a set 
of No linearly independent superkets | Hj ]> . This matrix is known as the Gram 
matrix (J0rgen Pedersen Gram). The largest value of Nq is D 2 since this is 
the maximum number of linearly independent operators spanning the space 
of Hermitian operators as a basis. Therefore, a POM contains the maximal 
set of D 2 linearly independent outcomes if the corresponding Gram matrix 97t 
has a rank of D 2 . 
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One can also define the frame superoperator 

^ = El n i»< n il" (1.2.18) 

5 

With this, an equivalent criterion for a set of informationally complete POM 
outcomes ELj is that the superoperator T is invertible. There exist dual su- 
perkets |©j} of |n^ with the property 

E l n i»( il = I = E l%»( n il > (1.2.19) 
3 3 

where X is the identity superoperator. The dual property is elucidated by the 
following equalities: 

Pj = tr{pIX,} 

= (pPM 

= EM n »X e *h>- 

k 

Since the final equality is always true for any p, it follows that ^©Jllj^ = 
trjOfcllj} = 5jk- Using Eq. (1.2.18), the dual superkets can be defined as 

l%» = ^ 1 |n i » (1.2.20) 

and it is straightforward to verify that Eq. (1.2.19) is immediately satisfied. 
If, in addition, the number of ILjs is exactly D 2 (minimal POM), then the 
dual superkets |©j} are uniquely defined as in Eq. (1.2.20). For overcomplete 
measurements, there is more than one way of defining these dual superkets and 
the |9j))sinEq. (1.2.20) serve as the canonical dual superkets. As an example, 
we consider a Z)-dimensional symmetric informationally complete POM (SIC 
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POM) [LLLK08, App05, ADF07, REK04, RBKSC04, SG10, Sco06] whose 
subnormalized rank-1 outcomes Hj, that is tr{Ilj} = 1/D, are such that 

(n;|n fe » = t r{ILn fc } = ■ (1-2.21) 

The corresponding dual superkets for this POM can be shown (see Appendix 
A) to be 

|e i > = |n i >D(D + i)-|i). (1.2.22) 

With all the necessary tools in place, we can now define the operator that 
maximizes the likelihood functional over all Hermitian operators: 

« Y^f^j- (1-2.23) 

3 

To verify that this is indeed the solution, we note that pj = tr{<5TIj} = fj 
are the solutions that maximize the likelihood functional in Eq. (1.2.12). A 
simple calculation shows that 

tr{<7EL,.} = ^ / & tr{n,e fe } = ^ fk$jk = fj ■ 
k k 

Alternatively, the estimator a in Eq. (1.2.23), also known as the linear- 
inversion estimator, can be obtained by directly inverting the set of D 2 con- 
straints trjallj} = fj for minimal informationally complete data. An essential 
tool for linear-inversion is a complete set of Hermitian, trace-orthonormal ba- 
sis operators Fj = rt such that tv{TjTk} = 5jk- By "complete", we mean 
that the superkets satisfy the completeness relation 



(1.2.24) 
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With this basis, one can express the operators a = Ylk tk^k an d Hj = 
Sfc c jk^k i n terms of Tj. The coefficients i& can thereafter be obtained by 
inverting the system of linear equations 

fj = ^cjktk . (1.2.25) 

k 

The estimator a is the ML statistical operator we seek if a > for the 
measurement data. We say that a is an unbiased estimator for ptme since the 
operator a = ^ ■ fj&j = J2j pf ue< 3>j = Ptme- This means that the set of all 
possible estimators a, for a given N, forming an uncertainty hyper- ellipsoid 
is such that the operator centroid of the set is ptme- Because of this fact, 
the estimator a is generally not a positive operator. Geometrically, part of 
the boundary of the uncertainty hyper-ellipsoid around ptrue that contains all 
estimators a can lie outside the state space for finite N. As iV increases, the 
hyper-ellipsoid shrinks to a point in the state space when iV becomes infinite. 
In other words, as long as iV is finite, if the true state lies on the boundary, 
then no matter how small this hyper-ellipsoid is, there will always be estima- 
tors that are not positive. For them, it follows that the true peak of C(D; p) 
lies outside the state space and the resulting positive ML estimator pml that 
maximizes £(2); p) inside the state space must necessarily be rank-deficient. 
In this case, there is no analytical expression for the positive estimator and 
numerical methods are needed to look for this estimator. The positive ML es- 
timator, like <7, is also a consistent estimator, which is defined by the property 
that pml approaches ptme a s N increases[TZE10, ZE11]. 
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1.2.2 Uncertainties in quantum estimation 



The usual distance functional 

C H - S (ptrue " P) = tr{(/W - f>) 2 } , (1-2.26) 

reminiscent of the Hilbert-Schmidt distance, is a common measure of the 
average deviation of an estimator p away from the true statistical operator 
ptrue and is known as the covariance of pt rue and p. To evaluate this functional, 
we express the operators ptrue = Ylj fj Tue ^j and p = Ylj tj^j m terms of a set 
of Hermitian, trace-orthonormal basis operators Tj. The resulting functional 
becomes 

Ch-S {t,r tlne ) = (t- r trU e) • it- r t rue) , (1.2.27) 

where t = r true . The corresponding dyadic 

t (t, r tmc ) = (t- r truc ) (t- f tmc ) (1.2.28) 

is known as the covariance dyadic and is positive. 

More generally, the covariance dyadic describes the mean squared-error 
between p and p trU e in terms of their respective coefficients. The average of a 
function f(D) of the data D is given by 

jm = Y, = / ( d ®M®IP) /(») • (1.2.29) 

There exists a lower bound for the covariance dyadic and to calculate it, 
we assume that p is unbiased, which as a consequence need not be positive, 
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and note that 



—J 
df 



r = r tr 



/(d2))^p(2)|p) 



J (dD)p(D|p) 



d_ 
df 



log (p(®\p)) 



df 



V. 



r = rtr 



(1.2.30) 



r = r tr 



where 1 is the unit dyadic, and 



| (dD)p(S)|p) 



d_ 

df 



-log 



d_ 
df 

0. 



(1.2.31) 



r = n r 



Combining Eqs. (1.2.30) and (1.2.31), we have 



(dS)p(S5|p) 



i9f 



log (p(2) |p)) 



(1.2.32) 



r = n r 



Multiplying the vectors x and y respectively on the left and right of 
Eq. (1.2.32) gives 



/ 



(d2))p(2)|p) 



d 

x-^log (p(£|p)) 



(t- f) ■ y 



x-y. (1.2.33) 



r = r tT 



By the Cauchy-Schwarz inequality (Baron Augustin-Louis Cauchy and Karl 
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Hermann Amandus Schwarz), 



yf = J (d»)p(S)|p) 
< J (d2>)p(2)|p) 



£■ ^log(p(3)|p)) 



dr 



-log(p(lD|p)) 



(7 - f) • y 
9 



r = rtr 



(9r 



log (p(S)|p)) 



y (d2))p(2)|p) y • (t- r t rue) (t - w) ■ y 



r = r tr 



3f 



-log(p(D|p)) 



d_ 
dr 



-log(p(D|p)) 



r = r tr 



= ^(Fisher's information dyadic) 



y-{t~ r true ) (f- rtrue) ■ y , 



^(f.rtrue) 



(1.2.34) 



where F is the Fisher's information dyadic (Sir Ronald Aylmer Fisher). A 
substitution of x = V" -1 • y gives the inequality 



y ■ V 1 ■ y < y ■ tf (t, rtrue) • y , 



(1.2.35) 



which is satisfied for any y. This implies that 



< ^fft r true ) 



(1.2.36) 



The inequality presented above is the famous Cramer-Rao inequality (Harald 
Cramer and Calyampudi Radhakrishna Rao) for unbiased estimation. It tells 
us that the lowest mean squared-error tr| C (t, f tmc ) j is given by tr| F 

It is interesting to study the asymptotic expression for the Fisher's dyadic 
*F* when N is large. To begin, we note that for sufficiently large N, the Central 
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Limit Theorem tells us that [RMH08] the conditional probability distribution 



p(V\p) 



exp j-i (r - t M h) ■ ^ml • *ml) | 



/ (2tt)^ 2 det{^ ML } 

(1.2.37) 

takes a Gaussian form (Johann Carl Friedrich Gauss), where £ml is the vector 
of coefficients for pml- With this, 



_l0g(p(J)|p)) = --- 



*ml) • C M l • (r — £ml) 

(r- Fml)} 



2 



j{ 

C ML" (^-4lL) 



and so 



^ ^. — — 

c ml • *ml) (r - Tml) 



r = r tr 



u ML 



- L ML 



u ML ' 



(1.2.38) 



An important lesson learned here is that for large N, the unbiased ML es- 
timator />, on average, approaches the lower bound (Cramer-Rao bound) set 
by the Cramer- Rao inequality asymptotically. The unbiased ML estimator is 
thus said to be an efficient estimator, that is, no other unbiased estimator 
can achieve a lower asymptotic mean squared-error. When the positivity con- 
straint is taken into account, the Cramer-Rao inequality will be modified to 
accomodate the constraint [Mar93, MSK08] and it was shown that the cor- 
responding constrained ML estimator is efficient in terms of the constrained 
Cramer-Rao bound. 
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Equation (1.2.38) provides an operational way to compute the uncertain- 
ties of a real quantity q = ^{pmlQ} , where 



q = tr{p ML <2} = tT{ptmcQ} = qtrue • (1.2.39) 

The corresponding Hermitian operator Q can be similarly expressed in terms 
of the set of operator basis {Tj} such that Q = Y^j Qj^j- Note that its variance 



(Aq) 2 = (q - qtrue) 2 = tr{(p ML - Ptrue) Q} 2 



— * 2 
[q- (*ML ~ rtvue)] 



q- (*ML — Hrue) (*ML ~ H;rue) 

q ■ C ml ■ q ■ 



Equation (1.2.38) tells us that C ml is the inverse of the Fisher's information 
dyadic F for sufficiently large N. This leads to [RMH08] 

(Aq) 2 = q-f- 1 -q. (1.2.40) 

Hence the meaning of the Fisher's information dyadic is quite clear for large 

N: it directly quantifies the uncertainty of the real quantity q and carries the 

same amount of information as C ml- If C ml is non-invertible, then F will 

^ ^ 

carry information in the support of C ml- 
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v- SECTION 1.3 

Informationally complete quantum state 
estimation 



1.3.1 Steepest-ascent (direct-gradient) algorithm 

Suppose an informationally complete POM, consisting of D 2 linearly indepen- 
dent outcomes, is used to reconstruct an unknown true state ptrue of dimension 
D. The detection of N copies of quantum systems yields a multinomial statis- 
tic for the measured number of occurrences rij for every outcome IT,, and the 
corresponding likelihood functional £({n.j}; p) is given in Eq. (1.2.12). To look 
for /5ml numerically, we first vary the log-likelihood log £({rij}; p) and obtain 



Note that maximizing the likelihood functional requires 5 log £({rij}; p) = 
0. To increase the value of 61og£({nj}; p) when the maximal value of 
C({rij}; p) is not reached, we need to look for a suitable variation bp such 
that 6 log £({rij}; p) > while maintaining the positivity of p. 

We begin by parameterizing the statistical operator 




(1.3.1) 



where 




(1.3.2) 



A^A 



(1.3.3) 



P = 



tr{AU} 
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with an auxiliary complex operator A. Under this parametrization, 



bp = — — — =-tr^ bA'A + A ] bA \ 



bA^A + JJbA _ A^A 
hMU} t^{AUp 
_ bA^A + jtfbA tr{5.4U + ^t 6 ^} 
~ ~~ tr{AU} P tr{AU} 

It follows that, 



61og£({n i} ;,) = tr jtf ^ - p j j 

_ f R^j-tv{Rp}^ M AR-tr{Rp}A \ 
- tT \ bA t r{AU} + A tr{AU} J 

= ^f4} tr { M " 1} Aj ] + Mt [A {R " 1)] } ' (L3 - 4) 

In deriving Eq. (1.3.4), the identity tr{i?p} = 1 is invoked. By setting 
5 log C({nj}; p) = 0, we arrive at the extremal equation for the positive ML 
estimator p ML [RHKL07, PR04]: 

RmlPml = Pml-Rml = Pml , (1.3.5) 

where the operator Rml is the operator R, defined in Eq. (1.3.2), evaluated 
with the ML estimator /5ml- 

One way of ensuring a positive 6 log£({nj}; p) every step is to note that 
the definition of the variation of log£({nj}; p), in the form of a trace equation, 
is given by 

61og£( { n j} ;p) = t, W ^ffi* " )' + M t ^ log rt ^' 

(1.3.6) 

where the partial derivative dlogC({nj}; p)/dA = A{R—l)/tT\AJA\. Noting 
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that the gradient of log C({nj}; p), which we now define to be a two-component 
vector dlogC({rij}] p), is given by 



d log £({n,-};p) 



dlog£({ nj }; P )/dA 

d\ogC{{ n] }-p)/dA\ 



(1.3.7) 



we can enforce the variation of Z = (A, A^) T to follow the direction of the 
steepest ascent up to the global maximum. In other words, 



8Z 





oc d log £({rij};p), 



(1.3.8) 



where e is a small positive parameter. Correspondingly, 

51og£({n j }; /0 ) = etr{(i? - l)p(R - 1)} . 
Thus, one derives the iterative equation, in discrete form, as 

Pk+l 



(1.3.9) 





l + f(i2*-l) 


Pk 


l + f(ijfc-l) 






l + ^R k -l 


)] 




l + ^R k -l 


)} 


} 



^1.3.10) 



which is precisely the iterative equation for the ML scheme established in 
[RHKL07, PR04]. It is now clear that ML is actually the method of steepest- 
ascent to search for /5ml • Since this enhanced algorithm attempts to reach the 
global maximum by directly following the gradient, this method can also be 
called the direct- gradient method (ML-DG). Hence, given the above iterative 
equation, one can attempt to obtain the ML estimator that gives the global 
maximum of log£({nj}; p). Numerically, the estimator /3ml = Pk is taken such 
that tr{|(i? K — 1) p K \} < e, where tr{|M|} = tr| \/MtM| is the trace norm 
for an operator M and e is a pre-chosen precision and must not be confused 
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with the small parameter e. One can also introduce an enhancement in the 
rate of convergence to this scheme by attempting to optimize the value of e 
in each step of the iteration so that the log-likelihood functional is maximized 
efficiently. This procedure is usually known as the line search and can be done 
in various ways. 

Outlined below is the iterative algorithm for the ML estimation scheme 
[RHKL07, TZE10]. 



ML algorithm using the steepest-ascent method (ML-DG) 


Starting from the maximally-mixed state p\ = 1/D, with k = 1 and a 


small fixed value of e, 


1. 


Compute Rk] 




• Escape from loop if tr{ \RkPk — Pk } < e; 




• Otherwise, proceed to the following steps. 


2. 


Carry out the line search procedure: 




• Use two trial values of to compute two Pk+iS and determine 




the value of the likelihood C({nj}; pk+i) for both. 




• Combine these two with C({nj}; pi-), which was in fact ob- 




tained from €k = 0, and compute a quadratic function of e& 




that interpolates between the three support values. 




• Find the value for which the quadratic function assumes 




its maximum. 


3. 


Use this maximizing to evaluate the new pk+i using Eq. (1.3.10), 




with e replaced by e^. 


4. 


Set k = k + 1 and repeat the iteration from the beginning. 
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The optimization of introduced here is practical since the exact maximum of 
efc is in general hard to compute. Such a line search optimization can in prin- 
ciple expedite the search of /5ml- However, when D and the number of POM 
outcomes are huge, such a procedure becomes impractical since it involves the 
evaluation of very many large matrices, which can be very computationally 
expensive. In this fixed value of = e is used instead. 

1.3.2 Conjugate-gradient algorithm 

The steepest-ascent, or direct-gradient, method seeks the extremal solution of 
a given function by following the path of its steepest gradient in the space of 
parameters. It may happen that in an iterative step, the path is quite parallel 
to another one taken in one of the previous iterative steps. In other words, 
the iterated answer traces out a "zig-zag" path in the space of parameters as 
it approaches the true extremal point. This causes a considerable retardation 
in the iteration if the precision e is chosen too small. Another alternative to 
this method is the approach of conjugate gradient. Initially developed for real 
quadratic objective functions of the form 

f(z) = -h-' r l-z + b-z, (1.3.11) 

where the real dyadic A > and the dimensionality of the real vectors is n, 
the conjugate-gradient (CG) iteration takes a path which "circulates" directly 
to the extremal point z = z max in exactly n iterative steps. Technically speak- 
ing, the search directions hu taken in the fc-th step is such that hk ■ A ■ hi = 
for j ^ k. This conjugacy property is where the name of this approach is 
derived. One can obtain a complete set of conjugate direction vectors using 
the Gram-Schmidt conjugation strategy (J0rgen Pedersen Gram and Erhard 
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Schmidt), a modified orthonormalization technique which accounts for the 
conjugacy property. However, this strategy ultimately requires all direction 
vectors to be stored into memory, since a linear combination of all the previ- 
ously computed direction vectors is required to compute the next one. Such 
a procedure can be computationally expensive for large n. 

In the CG method, the gradient vectors V n f(z k ) for every z k are used 
to compute the set of conjugate direction vectors h k . Here, V n is the n- 
dimensional gradient vector and V n /(ij%) = b— A ■ z k . With this substitution, 
the linear combination of h k s in the Gram-Schmidt conjugation procedure 
becomes just a single term^ and so there is no longer a need to store all the 
previously computed direction vectors. In each step, the conjugate direction 
vector h k and z k are thus generated pairwise. 

The CG algorithm for the quadratic form in Eq. (1.3.11) is outlined below: 

CG method for quadratic forms 

Beginning with h\ = g\ = b — A ■ z\ and k = 1, 

1. Compute efc = JX^^t and set z^+i = zj, + e^h^- This value of 
corresponds to the maximum value of f(zk+i) after a line search 
procedure. 

2. Set g k+ i = g k - e k A ■ h k . 

3. Set the parameter t k+ \ = gfc t^ +1 . 

4. Set h k+ i = g k+ i + t k+ ih k . 

5. Set k = k + 1 and repeat the iteration from the beginning. 

Very often, the objective function f(z) to be maximized is not a simple 
quadratic form as described in Eq. (1.3.11). This introduces a few compli- 
fpiease consult Ref. [She94] for the technical details and graphs. 



26 



Chapter 1. Quantum State Estimation 



cations to the simple CG algorithm outlined above. Firstly, the optimal value 
of e k is often not available readily as an analytical expression. Therefore, 
numerical methods have to be invoked in order to look for the value of e k 
such that f(zk+i) is maximal. In cases where such a numerical search for the 
optimal e k is computationally expensive, a fixed value of e k may be assigned. 
Secondly, we note that 

g k +i -g k = (1.3.12) 

when f(z) is a quadratic form. This follows from the fact that 



9k+i ■ 9k = 9k ■ 9k - tkhk ■ A ■ g k 

-. -, 9k- 9k ? -. 
= 9k ■ 9k - ^ — -. rtk ■ A ■ g k 
hk - A - h k 



ejk ■ 9k - t^ft^t^ • A ■ (h k - tkhk-i) 
h k - A-h k v J 

9k • 9k~ 9k- 9k = • 



Therefore, we have that 

t = 9k+i ■ 9k+i = 9k+i ■ (gfe+i ~ 9k) ^ 3 ^ 

9k - Jjk 9k - j)k 

Fletcher-Reeves Polak-Ribiere 
factor factor 

For a general function f(z), the two factors are clearly different. It is known 
that the CG algorithm which uses the Fletcher-Reeves factor (Roger Fletcher 
and Colin Morrison Reeves) converges only when the starting vector z\ is 
close to irnax, and that which uses the Polak-Ribiere factor (Elijah Polak and 
Gerard Ribiere) rarely diverges. This divergence can be prevented by defining 
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the Polak-Ribiere criterion 



f 9k+i ■ (fffe+l — 9k) n i , , .-, , r , 

7 fc+ i = max<^ — ,0} . (1.3.14) 

I 9k ■ 9k 



This implies that when the Polak-Ribiere factor becomes negative, the CG 
algorithm switches back to the DG algorithm. Putting the pieces together, 
we have: 

Polak-Ribiere CG method for general objective functions 

Beginning with h\ = g\ = V n f(zi) and k = 1, 

1. Compute e k using a line search procedure such that f(z k + e k hk) 
is maximal and set z k +\ = z k + € k h k . 

2. Set g k+ i = V n /(ifc+i). 

3. Set the parameter 7fc+ i = max ( ih+l^h+lzM q\ 

^ 9k'9k J 

4. Set h k+1 = g k+1 + jk+iH- 

5. Set k = k + 1 and repeat the iteration from the beginning. 

The main point of this short discourse is that the above algorithm can be 
generalized to the space of operators by simply replacing all numerical vectors 
by vector operators. The inner product of two vector operators X and Y is 
defined as (x,y\ = trl (x") y\, where the trace operation is understood 



to act on the operators in X and Y. To apply the conjugate-gradient strategy 

to ML, we first allow the operator vector Z = (^4,^) T to follow the search 

direction of the steepest ascent, namely SZ oc d log C({nj}; p). Subsequently, 

bZ will follow a series of approximately conjugate search directions defined 

by the dyadic dd log £({rij}; p) The standard Polak-Ribiere CG method, 

'''To visualize this more vividly, consider a quadratic form of three parameters, contained 
in the vector x, given by f(x) = — \x ■ A ■ x + b ■ x. Then, the three-dimensional gradient 
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when applied to ML, proceeds as follows: 

ML algorithm using the standard Polak-Ribiere CG method 
(ML-CG) 

Starting from the parameters A = 1, Z\ = [A,A^) T , G\ = H\ = 
d\ogC(pi) and k = 1, 

1. Compute Rk] 

• Escape from loop if tr{\RkPk — Pk\} < 

• Otherwise, proceed to the following steps. 

2. Optimize e& such that C({nj}; p^) is maximum using a line search 
procedure and set Z^+i = + tkHk- 

3. Set G k+1 = d log C({nj};p k+1 ). 

4. Set the parameter 7^+1 = max < (^ k+1 ' < ^ k + 1 ; q l (p G lak- 

! \G k ,G k ) I 

Ribiere) . 

5. Set Hk+i = Gk+i + 7fc+i-fffe. 

6. Set k = k + 1 and repeat the iteration from the beginning. 

We remind ourselves that the efficiency of the ML-CG method will be 
higher if the functional to be optimized is very close to a quadratic form in 
the space of parameters, in which case (Gk+i,Gk) ~ and 7^+1 > 0. Since 
the likelihood functional £({rij}; p) deviates far away from a quadratic form 
in Z, this term can be significant in value, causing 7^+1 to be constantly reset 
to and thereby turning the ML-CG method back to steepest-ascent. Hence 

V/(af) = 6— A ■ x and the search directions hj generated by the conjugate gradient method 
are related by VV/(aJ) = — A . 
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it is fruitful to consider a new Polak-Ribiere criterion, namely 

where £ is a suitably chosen parameter, which is less than 1, such that the 
factor £(Gk+i,Gk) is small. If £ is set to 0, corresponding to the Fletcher- 
Reeves scheme, the ML-CG algorithm may not converge. In general, the 
optimal value of £ that gives the minimal average number of iterative steps to 
achieve a certain numerical precision e depends very much on ptrue- In view 
of this, we set £ = 0.5 for any ptrue- Prom hereon, the ML-CG algorithm is 
defined with the new Polak-Ribiere criterion. Figure 1.1 gives a numerical 
simulation on a single-qubit state | ) = (|0) + |1) i)/V2, where |0) and |1) are 
two orthogonal kets. 
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Figure 1.1: Single-qubit state simulated with 10 3 detection copies over 100 experi- 
mental runs. We analyze the performance of ML-CG (■) in terms of the average 
number of iterations over the experimental runs. Here the precision e is set to 10~ 7 . 
In general, lower £ values can further boost the performance of both schemes. 



To investigate the performance of ML-CG numerically, Monte Carlo sim- 
ulations are carried out on a unitarily-invariant random ensemble of full-rank 
two-qubit mixed states. To generate each random mixed state ptme, we choose 
four random normalized kets {|V'fc)}fc=o an d f° ur random complex numbers 



30 



Chapter 1. Quantum State Estimation 



{ a k}k=o- Then each mixed state is defined as 



Ptvue 



k=0 



\ a k\ 



lev' \ v 
\ a k\ 



(1.3.16) 



where v is an integer parameter which we vary to obtain random mixed states 
of varying ranges of purity. To compute the optimal value of in the kth 
step, we evaluate the likelihood functional at ten different values of and 
perform a quadratic curve fitting to obtain the approximate maximum of the 
likelihood functional. For the POM outcomes, we use the tensor products 
of the single-qubit SIC POM (also known as the tetrahedron measurement) 
subnormalized projectors [TZE10, ZE11]. These four rank-1 outcomes of the 
tetrahedron measurement have Bloch vectors defined by 



"i 



i 

w 



,0,2 



V3 



-1 

v-v 



,a 3 



1 

71 



f-l\ 

-1 

V 1 / 



, CJ4 



V3 



1 



v-v J 

(1.3.17) 



This product measurement forms a minimal set of 16 informationally complete 
POM outcomes. 

All simulations are conducted with Mathematica on an Intel i7 Quad Core 
2.67 GHz machine. Figures 1.2 and 1.3 give the simulated results. Notice, 
however, that the improvement, in terms of average duration of each full 
run, of ML-CG over ML-DG I is generally smaller than that in terms of the 
average number of iterations required to complete a full run. The reason lies 
in the computation of matrix multiplications which can be significant in the 
conjugate gradient methods as D increases. Nevertheless, ML-CG shows the 
best average convergence rate for all the randomly generated two-qubit mixed 
states in terms of both the average number of iterations and average duration 
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Figure 1.2: A total of 1500 random two-qubit full-rank mixed state were simulated 
with eight thousand detection copies over fifty experimental runs. By fixing the 
precision e = 10~ 7 , the scatter plots of the average number of iterative steps over 
the experimental runs for ML-DG with fixed e fe = e (ML-DG I) (Red), ML-DG with 
optimized e fc (ML-DG II) (Blue) and ML-CG (Green) indicate an expected trend. 
For all the randomly chosen states, ML-CG outperforms ML-DG II with an average 
improvement of about 55%. On average, ML-CG requires about 95% less number of 
iterative steps than ML-DG I for the same precision. 
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Figure 1.3: Here is the corresponding plot of the average duration of one complete run 
of each of the three schemes: ML-DG I(Red), ML-DG II (Blue) and ML-CG (Green). 
The average improvement on which ML-CG outperforms ML-DG II, in terms of the 
average duration of one complete run, is about 65%. The corresponding improvement 
of ML-CG over ML-DG I is about 75%. 



compared to all other schemes. 

Next, we present two sets of simulation data for four-qubit tomography 
on the GHZ and W states. Let us emphasize that as the dimension of the 
Hilbert space increases, the computational cost for evaluating large matrices 
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GHZ state 



ML-DG I ML-DG II ML-CG I ML-CG II 

Iterations 112 42 39 33 

Duration / s 1.069 4.416 0.688 4.113 

W state 



ML-DG I ML-DG II ML-CG I ML-CG II 
Iterations 936 133 258 154 

Duration / s 9.288 14.330 4.759 18.123 



Table 1.1: Table of the average number of iterations and average duration to complete 
one full run of the respective iterative schemes for the four-qubit GHZ and W states. 
The POM for the simulations consists of the tensor products of four single-qubit 
tetrahedron outcomes. The above illustrates that on average, ML-CG I, which is 
ML-CG with fixed e^, performs better, in terms of the average duration of a full 
run, than the regular direct and conjugate-gradient schemes with optimization, 
even though the average number of iterations can sometimes be significantly reduced 
using the optimized schemes. The additional time taken for the type II algorithms is 
mainly due to the heavy matrix evaluations in the line search procedure. 



becomes more significant, especially in the likelihood functional computations 
required for the quadratic interpolation procedure. This is eminent in four- 
qubit state estimation. In this case, we also consider performing ML-CG with 
fixed £fc to reduce the overall time required to compute a full run of the al- 
gorithm. Finally, we compare the performances of ML-DG and ML-CG by 
performing quantum state estimation on one simulated set of measurement 
data for an eight-qubit pure state [HHR+05] with MATLAB. The POM used 
is the set of 2 16 = 65536 different tensor products of eight single-qubit tetra- 
hedron outcomes. In practice, it is difficult to store all the 65536 outcomes 
into memory on a personal computer and so we generate all these outcomes 
on the fly in each iterative step of the algorithms. In addition, the evalua- 
tion of these 256 x 256 operators is extremely costly. These factors, together, 
cause a tremendous slowdown in the durations of the algorithms. Hence, the 
type I algorithms are naturally more practical in this situation than type II 
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algorithms. The simulations show that ML-DG I takes about 143 hours to 
complete the run up to a fixed numerical precision e, whereas ML-CG I takes 
about 95 hours to achieve the same precision. Thus, ML-CG I does in fact 
offer a more optimistic alternative for quantum state estimation involving 
quantum systems living in large Hilbert spaces. It is important to note that 
the conjugate- gradient methodology we have presented in this section is ap- 
plicable to any algorithm that is based on the steepest-ascent method, as the 
machineries established are a natural extension to those of steepest-ascent. 

r- SECTION 1.4 1 

Informationally incomplete quantum state 

estimation 



If the POM used for measurement is informationally complete, then there 
exists a unique estimator pml > that maximizes C({nj}; p). However, if 
the POM is not informationally complete, then there are infinitely many es- 
timators that maximize £({nj};p) for a given set of fjS. In fact, because of 
the concavity of C({nj};p), the existence of two such estimators p\ and p2 
implies the existence of a continuous family of estimators p' = \p\ + (1 — X)p2, 
where < A < 1. Therefore in order to systematically choose one esti- 
mator for statistical prediction, we shall consider the principle of entropy 
maximization (ME) that goes way back to two papers by Edwin Thompson 
Jaynes [Jay57a, Jay57b] in 1957. In doing so, one can always obtain a unique 
estimator that maximizes both C({nj}; p) and the von Neumann entropy func- 
tional S(p) = — trjplogp} (John von Neumann). J. Rehacek et al. had looked 
into this ML-assisted ME technique in particular for commuting POM out- 
comes [RH04] and the photon- number statistics of light [HR06] . This section 
develops iterative schemes that are applicable for general situations. 
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1.4.1 General iterative scheme 

The original idea of ML-assisted ME considered by J. Rehacek et al. involves 
two steps. The first step is to perform the ML procedure in order to look for 
the estimators /5ml that maximize C({rij};p) given a fixed set of measured 
frequencies fjS from a informationally incomplete POM with K outcomes. 
In this case, there are infinitely many such ML estimators and as a result, 
the likelihood functional forms a plateau on the space of statistical operators. 
The second step is to select the estimator with the maximum value of S(p). 
Such a procedure is equivalent to raising the plateau into a concave hill so 
that the resulting estimator chosen gives the globally maximum value. In this 
way, a unique maximum-likelihood-maximum-entropy (MLME) estimator can 
always be obtained for statistical predictions. We do this by considering the 
Lagrange functional (Joseph-Louis Lagrange) V involving the von Neumann 
entropy functional S(p) and the constraints tr{pMiJlj} = tr{pIL,} = pj as 
well as tr{p} = 1 defined as 



V(p) = -tr{plogp} + £ Xj fa - tr^MLlL;}) + log^ (tr{p} - 1) , (1.4.1) 



3 

where the A^s and log p are the Lagrange multipliers for the constraints. Vary- 
ing V yields 



Thus, setting 5T>(p) to zero gives the maximum-entropy (ME) estimator of 
the form 




(1.4.2) 




(1.4.3) 
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which maximizes S(p) under the set of constraints, after setting 

tr J e ^ ^n 3 i 

The task now is to look for the Lagrange multipliers using the above con- 
straints. This requires the solutions to a set of nonlinear equations which in 
general may not be conveniently obtained, especially when the operators Hj 
do not commute. 

An alternative idea is to maximize the likelihood functional £({rij};p) 
by optimizing Xj of the estimator in Eq. (1.4.3) so that the resulting MLME 
estimator pmlme is the one that maximizes C({nj}; p) and is automatically the 
maximum-entropy estimator. An interesting observation^ is that the Lagrange 
multipliers are not all independent. This stems from the completeness of the 
POM Y2j Hj = 1 which implies that there are altogether K — 1 independent 
constraints for the Lagrange multipliers. As such one may choose to optimize 
only K — 1 Lagrange multipliers. 

Varying log C({rij}; p) yields 

8 log C({nj};p) = Ntr{R8p} , (1.4.5) 
where R = fj^-j/Pj- Using p of the form in Eq. (1.4.3), the variation 

with O = e-^j j j , involves the variation of O and this is carried out by noting 



^Thanks to Dr. Ng Hui Khoon, a research fellow in CQT, for pointing this out. 
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that given an operator B, 

&e B = f dxe^-^ B 5Be xB . (1.4.7) 
J o 

Substituting Eq. (1.4.6) into Eq. (1.4.5), the resulting variation of the 
log-likelihood functional log £({rij}; p) is derived to be 

&logC({ nj };p) =iV^6A J tr| / 9n j (J d.r e' >- A 11 //<■ ' A n - - 1 

(1.4.8) 

where N is the number of detection copies of the quantum system. One can 
immediately find that the derivative of log C({rij}; p) with respect to Aj is 

■^-logC{{ nj };p) = NtvLuj (J dxe x ^i x ^Re- x ^i x ^ - 1 

(1.4.9) 

Hence the maximal value of log £({nj}; p) is attained when the extremal equa- 
tion 

f j A MLME n,' £, -xJ2 A MLME n, -, rt a m\ 

/ die^i > -Rmlme e ^ j J = ^mlme (1.4.10) 
J o 

is satisfied, where -Rmlme = (pmlme) and lp MLMB is the identity operator on 
the support of pmlme- This is of course obvious in hindsight since Eq. (1.4.10) 
is equivalent to the statement 

-Rmlme /5mlme = pmlme -Rmlme = Pmlme (1.4.11) 

as in the case of ML. 

With the above setting, we can construct an iterative scheme MLME based 
on the principle of steepest-ascent. Since the SXjS are arbitrary, we can set 
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each variation as follows: 

d 

8\j = eNdj log£({nj}; p) = e— logC({ nj }; p) 

= etrj/TL, d.re'- X U h'v '>- : xu - - 1^ j , (1.4.12) 

where e is a positive parameter which defines the step size taken in every 
iterative step. So now the iteration proceeds by a step of size e along the 
direction of the gradient dj log £({nj}; p) in each step. We thus have the 
variation 8 log £({nj} ; p) to be always positive. The MLME scheme is then 
given by 

Scheme A 

Pk+l = — 7 1-^7, s — n , (1.4.13) 

tr f e E, (Af +ed j log C({n 3 };p k )) Uj 1 

djlogCdnjbpk) = Ntr^p k Uj ^j\x e x ^ X ^ R k e~ x ^ X f >U ^ - lj | . 

(1.4.14) 

As in the ML iterative scheme, one can always start from the maximally-mixed 
state. 

The fruit of the above discussion is an iterative scheme that looks for the 
MLME estimator directly rather than taking the ML-assisted ME approach 
which involves two steps and a set of nonlinear equations. This iterative 
scheme is conveniently applicable for general POMs and tomography in any 
Hilbert space dimension. In general, the CPU time for exponentiating a square 
matrix is acceptable even for matrices as large as 10 x 10 using commercial 
optimized algorithms. The only practical shortcoming in this scheme is the 
long CPU time required to perform the numerical integration in each itera- 
tive step and this can be quite serious as the dimension of the Hilbert space 
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increases. One possible way of circumventing the problem is to approximate 
the variation of the matrix exponential of an operator A as^ 

5e A « 1 (e A 5A + 8 A e A ) . (1.4.15) 

Then the direction of ascent in every step of Scheme A is given by 

^•log£(K};p fc ) = mJ Pk ^ R * + R ^ _ n ^ | . (1.4.16) 

In this way, the integration procedure can be avoided. 

At this point, we would like to make a distinction between this MLME 
technique and the conventional ME technique [BAD96, RP05]. The ME tech- 
nique takes the outcome frequencies fj as the probabilities pj and tries to 
search for the positive operator in Eq. (1.4.3) by maximizing S(p), subjected 
to the probability constraints which are mediated by the Lagrange multipliers 
Xj. The fundamental problem with this scheme is that, in general, the fjS 
cannot be treated as probabilities since they correspond to an operator which 
is not necessarily positive. This is due to the statistical noise which is inherent 
in the outcome frequencies arising from measuring finite copies of quantum 
systems. Therefore, in such cases, the ME technique fails as there simply is 
no positive operator which is consistent with the measurement data to begin 
with. The MLME algorithm, on the other hand, looks for the unique MLME 
estimator by confining the search within the plateau region inside the Hilbert 
space. Thus, positivity is ensured. In cases where the fjS are probabilities, 
both the ME and MLME schemes yield the same estimator by construction 
since the estimated probabilities pj = fj correspond to a statistical operator. 

We compare the MLME scheme with the standard ME scheme using the 

^Thanks to Zhu Huangjun who suggested this approximation. 
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simple example of a trine POM denned by the equations 

n = I (i + a z ) , 

= d.4.17) 

where the Pauli operators (Wolfgang Ernst Pauli) o~ x , a y and a z are given by 

/ 

| , o- x = 

i 




' -i 



V 



1 o 

-1 



(1.4.18) 



A straightforward calculation shows that when uq = 6, n + = 2 and n_ = 1 
after measuring N = 9 copies for instance, the standard ME scheme fails as 
no quantum state has the frequencies /o = 2/3, /+ = 2/9 and /_ = 1/9 as 
probabilities. On the other hand, the MLME scheme still gives a positive 
estimator described by the Bloch vector (0.194, 0, 0.981) for those frequencies, 
thus showing its versatility. Only when the frequencies are probabilities giving 
positive estimators may we use the ME scheme and in this case, the MLME 
scheme naturally incorporates these constraints. 

1.4.2 Qubit tomography 

In this example, to benchmark the MLME iterative scheme, qubit tomography 
simulations are performed using the trine POM defined in Eq. (1.4.17). In 
this case, no expectation value is measured along the y direction in the three- 
dimensional Bloch representation. One can easily show that the maximum- 
entropy estimator />me for the true state will always be represented by a 
real and positive matrix by simply minimizing the purity of the estimator 
since for any qubit statistical operator, a decrease in its purity corresponds 
to an increase in its entropy. In the simulation, we fix N = 10 6 , e = 10 and 
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(|0) + |l)i)/V2, where |0) =(1,0)' and |1) = (0, l) 1 . Therefore 

/ 



Ptruc 



0.5 -0.5 i 
0.5 i 0.5 



(1.4.19) 



Since this is an eigenstate of a y , we will ideally have {a x ) = (a z ) = 0. This 
implies that pj = (ILj) = 1/3 and we expect the final MLME estimator to 
be the maximally-mixed state. The MLME Scheme A gives the unique 
estimator 

/ 

0.499953 -0.000169978 
-0.000169978 0.500047 



/5MLME : 



(1.4.20) 



which is consistent with the expected result, under an iteration time of 0.015 s 
using the approximated gradient expression for a precision e = 10 -7 in a 
particular simulated experimental run on an Intel(R) Core(TM) i7 2.66 GHz 
computer using Mathematica. 

1.4.3 Two-qubit tomography 

For simplicity, we consider two different informationally incomplete POMs. 
The first POM consists of nine outcomes that are tensor products of a 
pair of qubit trine POM outcomes as in Eq. (1.4.17). In this case, 
there will be expectation values for observables which depend on a y like 
(cr x (g) cry), etc. However, as it turns out, the MLME estimator is still 
a real statistical operator in the computational basis, with the six expec- 
tation values tr{/5 M LME (1 ® cr y )}, tr{p M LME (cr y <8> 1)}, tr{p M LME (v x ® cr y )}, 
tr-fpMLME (fi/ 8> &x)}, tr{p M LME {?y ® cr z )} and tr{p ML ME (o- z <8> cr y )} all equal 
to zero. 

For the second POM, we emphasize the versatility of the MLME scheme 
by choosing a random POM consisting of nine outcomes by first generating 



1.4- Informationally incomplete quantum state estimation 



41 




(a) True state 



(b) Reconstructed state 



Figure 1.4: Two-qubit tomography using joint trine POMs consisting of nine out- 
comes. A Monte Carlo simulation is performed with the number of detection copies 
N = 10 6 on a random true state described by a real statistical operator. The ver- 
tical axis represents the real matrix elements for both the true and reconstructed 
statistical operators in the computational basis. The horizontal axes respectively 
represents the row and column labels of the matrices. The trace-class distance 

£>tr = tr{|/5 M LME ~ /0truc|}/2 IS 0.158. 

nine random complex operators Bj and then denning 



where x = Y2k^k^ k ' Care has to be taken to ensure that \ nas f un rank, 
which is the typical situation if the operators Bj are randomly chosen. Using 
this POM, the maximum-entropy estimator is in general a complex statistical 
operator. The results are shown in Figs. 1.4 and 1.5. The two figures show 
that the reconstructed statistical operators are in general close to the true 
statistical operators. 

1.4.4 Imperfect measurements 

In a practical tomography experiment, the detectors used are less than perfect. 
Typically, detection imperfections can be summarized using a set of positive 
numbers {rjj}, where rjj < 1 is the detection efficiency for a particular POM 
outcome Uj. More generally, one can describe a POM with more sophisticated 




1/2 
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(a) True state (b) Reconstructed state 




(c) True state (d) Reconstructed state 



Figure 1.5: Two-qubit tomography using a random two-qubit POM consisting of 
nine full-rank outcomes. A Monte Carlo simulation is performed with TV = 10 6 
on a random true state represented by a complex positive matrix of unit trace. The 
vertical axis in each of (a) and (b) represents the real matrix elements of the respective 
true and reconstructed statistical operators in some computational basis and that in 
each of (c) and (d) represents the respective imaginary matrix elements. In this case 
P tr = 0.206. 
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imperfections by introducing the efficiency matrix M„, with positive matrix 
elements satisfying the inequality 

5>i*<l. (1-4-22) 

3 

After obtaining these matrix elements through calibration, one can define a 
new set of outcomes 

n i = I>j-k n * (1.4.23) 

k 

such that G = J2 k U 'k < 1,1 • Therefore, ^jPj = Ej tr {p n i} < L 

As a consequence to these imperfections, we would not know the true total 
number of copies iV true that have reached all detectors. Denoting the total 
number of copies registered by the imperfect detectors by A, we can write 
down the likelihood functional for this scenario, with no emphasis on any 
particular sequence of detector clicks, as 

(\ / \ iVtrue— N 

II ) ( ) ■ 

where the indices here run over all outcomes and we define rj = < 1 to 

be the overall detection efficiency. The multinomial factor takes into account 
all possible sequences of having detected copies out of the total of A true 
copies sent to all detectors. Using Stirling's formula (James Stirling) log A! ~ 



'There are, of course, other types of experimental imperfections, such as the non- 
uniformity in the thickness of wave plates, the instability of the phase modulator, etc., 
that can affect the result of state estimation. These imperfections, in principle, can all be 
accounted for with the POM outcomes U'j. 
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N log N — N, the log-likelihood can be simplified to 



logman,}; p) = N fj logPj + (N tmc - N) log(l - r,) 



I N N tlue \ 

+ log U~(iv t , u ,'-V..„-, ■ (i^*) 



where J2j fj = J2j n j/ N = l - 

Performing the variation, we have 



8 log £'({n,}; p) =iV £ ^ - — — ^ «Pi 



+ 5AT true log(l - rj) + SNtmc log iV trU c - 6iV tmc log(iVt rU c - N) 

j\Pj 1 - V J V ^truc - N ) 

=tr( (NR - ^H G ) 6,1 + 6iV truc log ' < X " ^ 



1-?? J ' ) V ^truc"^ 

(1.4.26) 



Setting Slog £ ({nj}; p) to zero, i.e. maximizing log £' ({rij} ; p), requires the 
derivative 

^'<M^^^5r) (1 „ 7) 

to be independently zero. This implies that the extremal equation 



AW = - (1-4.28) 
V 



has to be satisfied, which is a rather natural statement since the likely number 
of copies that are actually received by the imperfect detectors is, of course, 
the true total number multiplied by the overall detection efficiency that is less 
than one. Then the resulting expression for 5 log £'({rij}; p) further simplifies 
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to 



6 log £'({«.,•}; p) = iVtrj^R' - hl^j 6p| , (1.4.29) 

where R' = Ylj fj^'j/Pj an d the operator —G/i] accounts for inefficient detec- 
tions. 

We may naively make use of Eq. (1.4.3) to derive the following scheme: 
j: i (^ k) +ed j log/:'({n j };p k ))n j 



fl.4.31) 



with the index j running over all POM outcomes. However, it turns out that 
there are many different sets of probabilities pj that maximize \ogC({nj}; p) 
for a fixed set of measured data and hence multiple MLME estimators. We 
first note that the log-likelihood functional log £({7^}; p), after an application 
of the Stirling's formula on the factorials, is a concave function in pj since 

\og£\{ nj }-p) =yV-log (rP—) (1-4.32) 

j Kl^kPkJ 

and each logarithmic term in the sum is concave in pj . Hence concavity is not 
the cause of the existence of non- unique extremal pjS. To identify the root 
of the problem, we look at the derivatives of log £'({nj}; p) by differentiating 
Eq. (1.4.32) with respect to pj, i.e. 

|-,„ g r<( W; p)^__^. <^ 33 > 
Then an extremal solution of pj satisfy the above equations with 
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d log £,'({rij}; p)/dpj = inasmuch as 



Pj 



(1.4.34) 



f> 



k 



For K — 1 detected POM outcomes, there are altogether K — 2 indepen- 
dent equations and one normalization constraint for the full set of pjs. From 
Eq. (1.4.34), it is clear that the total number of available equations which 
are independent is K — 1 and thus, there exist infinitely many solutions for 
these reduced set of equations, for the number of independent variables is now 
more than the number of independent equations. A simple example is a set 
of three POM outcomes, with 713 = ^3 = for the third outcome. Then the 
only independent equation involving the probabilities is 



and hence, there are infinitely many solutions of p\, P2 and pz = 1 — pi — P2 
which maximize F. 

In other words, we have infinitely many sets of solutions for pj, with each 
set giving rise to a unique MLME estimator. The task is then to select the 
MLME estimator that has the highest entropy out of the continuous family 
of MLME estimators. To do this, we first realize that Eq. (1.4.34) simply 
implies that the ratio Pj/ fj equals a constant value for 1 < j < K — 1. Hence 
a scaling transformation on a reference set of solutions p^Q with a continuous 
parameter a such that 




(1.4.36) 



also gives another set of solutions which satisfy Eq. (1.4.34). The resulting 
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maximum entropy estimator is obtained by varying the Lagrange functional 



K-l 



V(p) = - tr{plo g/0 } + X 3 {PJ ~ «^ M o L ) + *K 
+ log / u(tr{/}} - 1) 



K-l 



PK 



3=1 



(1.4.37) 



and later setting the variation zero. In this way, the parameter a is optimized 
to give an estimator with the highest entropy among the family of MLME 
estimators. It follows that the extremal equation, after a variation in a, is 
given by 



where 



This implies that 



K-l 



Pi 



Pjfi 



( v a Hi, -.^iik) 



Pme 



(1.4.38) 



(1.4.39) 



(1.4.40) 



tr | e E^,(n 3 +ftn K) j • 

Taking the ME estimator of the form in Eq. (1.4.40), one can derive an 
iterative scheme to maximize log£({nj}; p) which is given by 



Scheme B 



E, UP+edj log C'({nj};p k )) (U, + J S i n Jf ) 



Pk+1 



tr< e 



E i ( 5 +*dj log C ({nj };p k )) (n 3 - +/3jTl K ) 



dj log ({rij}; p k ) =Ntrl p k Uj J dx 



(> .E,Af(n j+ ; ( ii ;il(i?fe ___ G 



(1.4.41) 



1 



(1.4.42) 
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where the extremal equation to be satisfied by />mlme is 



-RmlmePmlme = -Gpmlme- (1.4.43) 
V 



With the approximation supplied by Eq. (1.4.15), the gradient can be approx- 
imated to 

dj log £({nj};pk) 

(ILj + ^Uk) {Rh ~ ^)G) + (R h - ^G) (IL, + frllK) | 



Nti{ Pk - 



(1.4.44) 



Since this scheme is independent on the choice of p^o , one may first perform 
ML starting from the maximally-mixed state and make use of the resulting 
set of ML probabilities to carry out Scheme B. 

To demonstrate the results of the scheme, we first ran a single simulated 
experiment involving the measurement of 5000 copies of qubits prepared in a 
random state using of a random three-outcome POM, with one of the POM 
outcomes not registering any qubit. Post-processing the data with Scheme 
B indeed gives the MLME estimator which has the highest entropy among 
all other estimators generated using the former naive scheme by varying the 
starting state for each iteration. The result is shown in Fig. 1.6. 

Fig. 1.7 compares the performances of Scheme A, with which we search 
for the MLME estimator by assuming that the measured data rij are all we 
have while ignoring the possible missing data, in qubit tomography using the 
trace-class distance 

Ar = ^tr{|p M LME - /Otrue|} (1.4.45) 
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Figure 1.6: A simulated experiment on a random state, in which 5000 qubits were 
measured using a random imperfect two-outcome POM. The plot markers, which are 
indicated by dots, represent the entropies of the MLME estimators generated by the 
naive scheme starting from random states in the uniform distribution with respect to 
the Hilbert-Schmidt measure. 10 3 such estimators were computed. The thick solid 
line represents the entropy of the MLME estimator generated by Scheme B. 




as the figure of merit to quantify the distance between /6mlme and ptrue- The 
lesson here is that if one neglects the consequence of imperfect measurements 
in performing state reconstruction, the quality of the resulting reconstructed 
state estimator will typically be much lower than that obtained from a scheme 
which accounts for this imperfection. 

In a typical experiment, all detectors are controlled to have the same 
efficiency ry,j. = ^o^jfc- I n this special setting, the operator R' in Eq. (1.4.29) 
further simplifies to 

R> = R+ ' i —J^. (1.4.46) 
Vo 

Incidently, the term that is a multiple of the identity operator does not af- 
fect the likelihood maximization procedure at all, and we will obtain exactly 
Scheme A for the incomplete set of data. In other words, since all the detec- 
tors have indistinguishable efficiencies, we can consider this special setting as 
the situation in which the observer has a complete set of measurement data 
that is less than that for the case when all detectors have 100% efficiency. 
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Figure 1.7: A comparison of two different schemes for a fixed random incomplete 
POM with 10 3 random qubit true states distributed uniformly with respect to the 
Hilbert-Schmidt measure. Fifty experiments were simulated for every true state, 
with N = 5000 for each experiment, and the average trace-class distance T>^ s was 
computed. The entire simulation was done with a set of randomly generated, in- 
formationally incomplete POM consisting of three outcomes. A POM outcome was 
discarded to simulate the situation in which two functioning detectors out of the three 
are registering the qubits. The plot markers denoted by "+" represent reconstructed 
states using Scheme A, and those denoted by "□" represent the reconstructed states 
using Scheme B. The missing probabilities estimated by the reconstructed states 
using the Scheme B are typically closer to the missing frequencies that would be 
measured if the discarded detector was functioning compared to those estimated by 
the reconstructed states using Scheme A. About 80% of the total number of true 
states respond better under the second scheme. 



1.4.5 A new perspective 



Previously, we described the original idea of the ML-assisted ME procedure 
for a set of informationally incomplete measurements in a given quantum 
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tomography experiment, that is the selection of the most-likely state estimator 
with the highest von Neumann entropy as the least-biased state estimator. 
Such a procedure usually involves complicated systems of non-linear equations 
which are especially hard to solve for non-commuting measurement operators. 

We then established novel and more feasible schemes, via the steepest- 
ascent approach, which are suitable for any set of measurement operators, to 
obtain the same result by maximizing the likelihood functional over the space 
of statistical operators, with each operator assuming the form that maximizes 
the von Neumann entropy functional for a fixed set of probabilities. This 
MLME approach, which effectively condenses the ML and ME optimization 
procedures into one, can in fact be slow. This is due to the fact that the 
proposed MLME algorithm proceeds along a search path that deviates away 
from steepest-ascent because of the approximation in Eq. (1.4.15). 

In the subsequent sections, we establish more efficient MLME algo- 
rithms by viewing the problem of MLME in a different perspective [TZE + 11, 
TSE + 12]. We then apply these new algorithms to several different situations. 

1.4.5.1 A new algorithm for perfect measurements 

Assuming that the measurement detections are perfect, one can consider the 
optimization of the normalized log-likelihood functional log(£({nj}; p))/N, 
with £({nj}; p) defined in Eq. (1.2.12). The motivation for introducing 
the normalization will become clear soon. The MLME scheme can then 
be perceived as a standard constrained optimization problem: maximize 
log(£({nj}; p))/N subjected to the constraint that S(p) takes the maximal 
value S'max- The Lagrange functional for this optimization problem is defined 
as 

l(\;p) = \(S(p)-S max ) + ±-\og/:({n j };p), (1.4.47) 
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where A is the Lagrange multiplier corresponding to the constraint for 
S{p). This is equivalent to maximizing S(p) with the constraint that 
log(£({nj}; p))/N is maximal, as discussed previously. We denote the estima- 
tor that maximizes X(A; p) by p\\. Incidently, as a result of the normalization 
of log(£({nj}; p)), the functional I(X; p) is a sum of two different types of 
entropy, up to an irrelevant additive constant ^ ■ fj log fj : the von Neumann 
entropy S(p) that quantifies the "lack of information" , and the negative of the 
relative entropy S({fj}\{pj}) = fjlog(fjfpj) that quantifies the "gain of 
information" from the measurement data. The scheme can now be interpreted 
as a simultaneous optimization of two complementary aspects of information, 
with an appropriately assigned constant relative weight A. In addition, the 
normalization of log£({nj}; p) renders the optimal value of A to be indepen- 
dent of N. 

When A = 0, we recover the Lagrange functional for the log-likelihood 
functional alone. Owing to the informational incompleteness of the measure- 
ment data, there exists a convex plateau structure for the log-likelihood func- 
tional. As A — > oo, the von Neumann entropy becomes increasingly more 
significant and the resulting estimator /S^a-^-oo approaches the maximally- 
mixed state 1/D. Naturally, when A takes on a very small positive value, 
the contribution from XS(p) becomes much smaller than log(C({rij}; p))/N 
and the effect of the von Neumann entropy functional is only significant over 
the plateau region in which the likelihood is maximal. Figure 1.8 illustrates all 
the aforementioned points. This means that, in general, A should be chosen so 
small that S(pi t \) takes a value that is very close to the minimum, and below 
which there are only very slight changes in the two entropy functionals. The 
methodology to select an appropriate value of A will be discussed in §1.4.5.3. 

Let us derive the iterative algorithm for maximizing X{\ — > 0; p) with 
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(c) A = 1(T 3 

Figure 1.8: Schematic diagrams of I(X, p) on the space of statistical operators. The 
maximally-mixed state resides at the center of the square base which represents the 
Hilbert space. At the extremal points of A, X(X = 0; p) = log(£({rij}; p))/N, with a 
convex plateau at the maximal value, and T{X — > oo; p) = XS(p). Plot (c) shows the 
functional with an appropriate choice of value for A for MLME. An additional hill-like 
structure resulting from S(p) is introduced over the plateau, so that the estimator 
with the largest entropy can be selected from the convex set of ML estimators within 
the plateau. 



54 



Chapter 1. Quantum State Estimation 



respect to p. After varying I(\ — >• 0;p), we have 



6Z(A->0;p) = -Atr{5plogp} + V ^-Spj . (1.4.48) 



The variations Spj, or 5p, have to be such that p stays positive after these 
variations. With the help of the parametrization in Eq. (1.3.3), we find that 



where 



with 



<K = R- 1 - A(logp-tr{plogp}) (1.4.50) 



R = E f i ^ ■ (1.4.51) 

When Z(A — > 0; p) is maximal, we have 5I(\ — )■ 0; p) = and the extremal 
equations 

pJK = 9tp = (1.4.52) 

are satisfied. Therefore, to solve these extremal equations numerically, we 
iterate the equation 

(A + bA k ) (A k + 8A k ) 

Pk+i = -fV- fv r (1.4.53) 

tr|^ + 5 ^Tj (^4 fc + 6^4 fc )| 

starting from some statistical operator p\, until k = k' such that the norm of 
Pk'Jik' is less than some pre-chosen value. We then take pmlme = Pk' as the 
MLME estimator. Maximizing I(X —> 0; p) will require 81 (A — >■ 0; p) to be 
positive whenever X(A — )• 0; p) is less than the maximal value. A straightfor- 
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ward way to enforce positivity is to set 

5 A k = (54) f = eAkXk oc , (1.4.54) 

with e being a small positive constant. This is the steepest- as cent method. We 

have thus established a numerical MLME scheme as a set of iterative equations 

(1.4.53) and (1.4.54) to search for the MLME estimator using the measurement 

data obtained from perfect measurement detections. More compactly, the 

relevant iterative equations are 

New MLME iterative equations for perfect measurements 

= (l + em k )p k (l + e<R k ) 
Pk+1 tr{(l + e<K fc )p fc (l + e<K fc )} ' 

V\ k = R k -l - A(logp fc - trjjOfelogpfc}) . (1.4.55) 
There exists an interesting structure in these MLME estimators and to 
explore it, one needs some knowledge on the structure of the POM used and 
its influence on the D-dimensional Hilbert space. Suppose a set of K POM 
elements IT,- are informationally incomplete. A consequence of this is that the 
number of linearly independent ILjs is less than D 2 . As discussed in §1.2, to 
determine their linear independence, we can look for the eigenvalues of the 
K x K Gram matrix 9JT whose matrix elements are defined as 



m jk = tr{n 3 n fc } . (1.4.56) 

Thus, a set of informationally incomplete IljS acting on the Z?-dimensional 
Hilbert space is such that the number of positive eigenvalues of 9Jt, denoted by 
n>o, is less than D 2 . Any D-dimensional positive operator can be represented 
by a set of D 2 Hermitian basis operators Tj satisfying the trace-orthonormality 
condition tr{PjTfc} = 5j k . For dimension two, an example of such a basis is 
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the the familiar set of four operators l/\/2, a x /V~2, a y /V2 and a z /V2. Once 
the number of independent measurement outcomes n>o is known, one can 
construct a set {r,,}™>° of n>o trace-orthonormal Hermitian basis operators 
directly from the K POM elements. In other words, each of the K POM 
elements can be expressed as a linear combination of the n>o basis operators 

where all coefficients ajk are real. This implies that the n>o-dimensional 
subspace is spanned by the basis operators that uniquely specify the POM 
outcomes. We will coin this subspace the measurement subspace. The rest of 
the D 2 — n>o Hermitian basis operators, which are trace-orthonormal to the 
previous set and span the subspace, that is complement to the measurement 
subspace can also be constructed. 

Suppose a state estimator /3ml is generated using the ML procedure on a 
set of measurement data obtained from the POM outcomes H,. We can rep- 
resent this estimator by a set of Hermitian trace-orthonormal basis operators 
inasmuch as 

n >0 D 2 
PML = E C ^+ E (1.4.58) 

k=l fc=n>o+l 

" ^ ' S * ' 

=PML =PME 

The part pml resides in the measurement subspace, which is spanned by the 
measurement outcomes Uj giving the measurement data, and is uniquely fixed 
for all ML estimators by the ML procedure for the same set of measurement 
data. The part pme resides in the complementary subspace, which is orthog- 
onal to the measurement subspace, and thus does not contribute to the pjS. 
In other words, trjpMEH,} = and this can imply the existence of a family 
of Pmes that gives the same set of ML probabilities as long as the /Smls are 
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positive. 

Therefore, the MLME scheme can be understood as an optimization over 
the complementary subspace to maximize S(p) under the constraint />mlme > 
0. However, one notes that only certain sets of c^ E s are allowed during the 
optimization in order to obey this positivity constraint. This is especially 
important when />mlme is rank deficient and lies on the boundary of the state 
space. Geometrically, the plateau of most-likely states is generally a much 
smaller subspace contained in the complementary subspace. In some cases, 
this plateau contains a single ML estimator because of the positivity constraint 
even when the measurements are informationally incomplete. In general, the 
boundary of the plateau is complicated and deserves further study. 

1.4.5.2 A new algorithm for imperfect measurements 

In actual experiments, as discussed previously, the measurement detections 
will usually be imperfect in the sense that the detection efficiency m < 1 of 
a particular measurement outcome n., is less than unity. In this case, the 
overall outcome probabilities 

p j =ti{pU' j } (1.4.59) 

will not sum to unity. Hence, we have a set of POM with outcomes n^- = rjjUj 
such that G' = J2j ^ < 1. A consequence of this is that the true total number 
-Ntrue of copies received is not known, since only N < At rU e are detected 
(N = At rue when all rjj = 1 as in §1.4.5.1). 

From §1.4.4, the correct form of the likelihood functional for this situation 
is given by 

J) ( L46 °) 
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up to an irrelevant multiplicative factor, with its corresponding logarithmic 
variation 

51og£'({n i };p) = JVtrj (r! - ~) SpJ (1.4.61) 

with i?' = fjTlj/pj. The additional term — bpG'jr] in the argument of the 
trace accounts for copies that have escaped detection. 

Defining X(A 0; p) for the new POM and its £'({%}; p) in Eq. (1.4.60), 
one can derive the iterative equations 

New MLME iterative equations for imperfect measurements 

(1 + 6^)^(1 + 6^) 

tr{ (1+^)^(1 + ^)}' 
=R' k -- m - X(logp k -tT{p k logp k }) , (1.4.62) 

with = YljPj ■ We n °te that more efficient algorithms, using the 

conjugate- gradient method, can be derived from these steepest-ascent algo- 
rithms using the machineries introduced in §1.3.2. 

1.4.5.3 Applications 
Homodyne detection tomography 

To discuss the methodology of choosing A, we shall apply the MLME scheme 
to homodyne detection tomography, a technique which is used to reconstruct 
quantum states of light [SMBF93, OTBG06, NNNH+06]. This is typically 
done by measuring a POM which resembles a set of eigenstate projectors 
\x$) {x#\ of quadrature operators X$ = Xcos$ + Psini? for various i? values, 
where X and P are respectively the position and momentum quadrature op- 
erators and x and i? are parameters specifying these projectors. Introducing 
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the standard annihilation operator A = (X + iP) / y/2, we have 

X, = Ae j_ Ae . (1.4.63) 

To facilitate the numerical simulations with the eigenkets \x#), the correspond- 
ing quadrature wave functions (x$\n) in the Fock representation are needed. 
To obtain these wave functions, we first note that the product of A and the 
function f(A^A) satisfies the relation A f(A^A) = f(A^A + l) A since, for any 
Fock ket \n), 

Af(A^A)\n) = f(n)A\n) 



= \Jn — 1 f(n) \n — 1) 

= Vn~^lf(A^A + 1) \n - 1) 

= f(A*A+ l)A\n) . (1.4.64) 



From this relation, we realize that 



I, = e' iMU Xe iMU , (1.4.65) 



and its corresponding quadrature eigenket \x$) is thus obtained via a unitary 
transformation 

Ixo) =e~ MAtA \x) (1.4.66) 

of the corresponding eigenket \x) of the position quadrature operator X. 
Hence, in the Fock representation, the corresponding quadrature wave func- 
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tions are given by 

(n\x#) = e~ M (n\x) 

= ; 1 , e - in V x2/2 HJx) , (1.4.67) 

where H n {x) are the Hermite polynomials (Charles Hermite) of degree n. 

It is clear that a finite set of such measurements is never informationally 
complete in the infinite-dimensional Hilbert space and thus the MLME scheme 
is necessary to obtain a unique estimator. Figure 1.9 shows the dependence 
of log (£(/))) /N and S(p) on A such that 6X(A — > 0; p) = 0. In practice, A 
can be chosen from a range near zero, within which log (£(/>)) /N and S(p) 
remain almost constant. 









/ S(p) 




x . ^og£(p)/Ar ; 







iir 5 o.ooi o.i 10 iooo io 5 
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Figure 1.9: A simulation on quantum tomography on a randomly generated mixed 
state of light in the five-dimensional Fock space. In this plot, the number of copies of 
quantum systems measured is fixed at N = 10 4 . A choice of 20 quadrature eigenstates 
made up of four different i? settings, with five x values corresponding to each setting, 
which are projected onto this space was used and state estimators are constructed 
for different values of A. As A decreases, both the entropy and likelihood functionals 
approach their respective optimal values obtained from MLME (i.e. when A — > 0). 
When A is zero, there is a convex set of estimators giving the optimal likelihood 
value. For very large A values, the estimators approach the maximally-mixed state 
and hence S(p) approaches the maximal value log 5. 



Homodyne detection tomography is commonly used not only in quan- 
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turn tomography on the true state, but also in quantum diagnostics where 
a given true state is to be classified as being classical/non-classical or sepa- 
rable/entangled. With the help of the coherent states \a) (a\, the following 
decomposition 



for a state p can be used to distinguish classical states from non-classical 
ones, where the function P(a!) is known as the Glauber- Sudarshan P function 
(Roy Jay Glauber and Ennackal Chandy George Sudarshan) of the complex 
parameter a' . Using this decomposition, we define the state p to be a classical 
state if P{a) is positive for all a, and only then: that is, p is a statistical 
mixture of coherent states. Otherwise, p is non-classical. The symbol (da) 
denotes the integral measure over the real and imaginary parts of the complex 
variable a. 

One very popular way to represent the measurement data obtained in a 
typical homodyne experiment is by means of the Wigner functional (Eugene 
Paul Wigner) of p defined as 



This functional is a quasi-probability density functional that maps the sta- 
tistical operator onto the phase space (see Ref. [Wig32]) and has many nice 
properties that are symmetric with respect to the phase space variables x and 
p. In addition, this functional can be used to determine if a state p is non- 
classical. To see this, we note the coherent-state representation of p defined 
in Eq. (1.4.68) and the expression for the wave function of the ket \a) given 




(1.4.68) 




(1.4.69) 



by 
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Using these equations, 



W(x,p) = J (da')P(a') J dye 1 
j\da')\p{a') 



a ) { a'* 



X+ 2 



2 

7T2 



/ 



x dye 



2v ^exp^^+i P j 

y(da')^(«')e- 2|a ' |2+2 ^ Re{(a; - ip)a ' } . (1-4.71) 

Since the exponentials are always positive, any non-positivity of W(x,p) must 
originate from a non-positive P(a'). The converse is in general not true, 
however, as there are non-classical quantum states that give positive Wigner 
functions. A naive quantity that is often investigated as an indication of 
whether an unknown true state is non-classical is the value of the Wigner 
functional at the phase space origin evaluated with a reconstructed estimator 
p for the unknown true state. This is defined as Woo = W(0, 0) = 2tr{pP}, 
with the parity operator V = J dx \x) (—x\. In the Fock representation, the 
parity operator becomes 

? = /d* |*)(-*| 

oo . 

= / dx \x) (— x\n) (n| 

n=0 =(-l)"<x|n) 
oo 

= X»(-lHn| (1.4.72) 

n=0 

= (-1) AU (1.4.73) 



due to the property of the Hermite polynomials contained in the complex 
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function (x\n). To obtain an estimator p, one would need to choose a sub- 
space from the infinite-dimensional Hilbert space in which the reconstruction 
procedure is tractable. This means that the value of Woo will depend on this 
truncation, which in turn relies on the prior knowledge one has about the 
true state. Using the new MLME scheme, we perform a simulation, shown in 
Fig. 1.10, to illustrate this dependence. 



1.0 - 



0.5 r 

0.0; 
-0.5 ; 

-1.0 1 



5 10 15 20 

Dimension 

Figure 1.10: A simulation on quantum tomography on a randomly generated mixed 
state ptruc of light in the 20-dimensional Fock space with a slightly positive Woo = 
0.141 . £>t r and Woo respectively denote the trace-class distance between the recon- 
structed estimator and the true state and the Wigner functional at the phase space 
origin, both averaged over 50 experiments with N = 10 4 . The same set of 20 quadra- 
ture eigenstates as in Fig. 1.9, projected onto this space was used and this set of 
measurements is informationally complete in the two-, three-, and four-dimensional 
Fock subspaces (shaded region). The values Woo and D tI were obtained by ML 
[SMBF93, OTBG06, NNNH+06] in subspaces of dimensions two to four, and by the 
MLME scheme in dimensions greater than four. The plot shows a strong dependence 
of Woo and D tr on the subspace dimension. In this case, it is obvious that the nega- 
tivity of Woo inferred by a reconstruction in a subspace too small is just an artifact 
of the truncations. Also, D tI decreases as the reconstruction subspace increases in 
dimension. This demonstrates the advantages of the MLME scheme over the ML 
method. 

If the true state lies outside the subspace of interest, then the estimated 
value of Woo can drastically deviate from the true value. It is clear that a 
truncation of the Hilbert space into a smaller reconstruction subspace can lead 
to diagnostics which are highly incompatible with the true result. So, if one 
is interested in performing an objective quantum tomography experiment on 
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a given collection of identically-prepared quantum systems with some prior 
knowledge regarding its true state, an option would be to reconstruct the 
MLME estimator in the largest possible subspace based on this prior knowl- 
edge. By enlarging the reconstruction subspace, many more admissible states 
are taken into consideration and more reliable state estimations and quantum 
diagnostics can thus be performed. We now have an operational reconstruc- 
tion scheme that combines our knowledge and ignorance about the unknown 
true state to give us a unique state estimator in an objective way. 

Time-multiplexed detection tomography 

Next, we apply the MLME technique to simulation experiments on time- 
multiplexed detection (TMD) tomography [ASS + 03, HHP04]. For experiments 
of this type, photon pulses of a particular quantum state, where each pulse 
is a wave packet containing a few photons, are sent through a series of beam 
splitters**, each associated with a certain transmission probability. Behind 
each of the output ports of such a series is a single-photon detector that either 
registers a click from an incoming split photon pulse, with some detection 
efficiency, or does nothing. Thus, each output port has a certain overall 
efficiency fjj which is related to the relevant transmission probabilities and 
detection efficiency (See Fig. 1.11). 

As a consequence of this, the POM outcomes 



will be a mixture of Fock states, with the coefficients cj n related to m 
[RHH+03]. If there are iV ports output ports, where all rjjS are different, there 

will be 2 NpoitB distinct POM outcomes that arise from the binary nature of the 

**The word "beam splitter" , used in this context, represents a class of possible apparatuses 
used to split photon pulses, which includes conventional beam splitters, optical fibers, etc. 
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Figure 1.11: A schematic diagram representing the time- multiplexed setup with K+l 
output ports. The TjS are the respective transmission probabilities for the jth beam 
splitter. The overall efficiency for, say, the kth port is given by fju — ?7fe(l — 7fe + 
TK+i5k,K+i) ]lj = 



=1 



single-photon detectors. In addition, X]j=i H = 1 since the 2 Arports binary 
sequences of detection configurations constitute all possible events. These 
POM outcomes commute and a measurement of these outcomes only gives 
information about the diagonal entries of the statistical operator of the true 
state in the Fock basis. In order to obtain information about the off-diagonal 
entries, one can, for instance, displace the current set of 2 NportB POM outcomes 
in phase space with some complex value away from the origin using the 
displacement operator 

V(a k ) = e akA ^-< A . (1.4.75) 
Then, the new set of outcomes 

n j(«fc) = ^V{a k )ILp\a h ) , (1.4.76) 

with Af being the total number of such displaced set of 2 Nports outcomes, do not 
commute with the undisplaced set. These displaced outcomes are suitable for 
a measurement that is designed to obtain information about the unknown true 
state by sampling over multiple a^s. Experimentally, these displaced POM 
outcomes can be realized with unbalanced homodyne detection [WV96]. 

In the simulations, four output ports, corresponding to a total of 2 4 = 16 
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POM outcomes, are considered. Two different true states are selected to 
illustrate the results of MLME. The first true state is chosen to be a stationary 
state of a laser given by 



II 



p ss = e-^\n)^-(n\ , (1.4.77) 

n=0 



where /i defines the mean number of photons [WV02]. For the second true 
state, the state p a i = \ ) a , a ,{ |, where 

Ia ' ) + | - a,) (1.4.78) 



2(l + e -2M 2 ) 

is the superposition of the coherent states \a!) and |— a'}, is chosen. Statistical 
operators are first reconstructed from the simulated data. For this reconstruc- 
tion, one has to decide on the dimension D sn ^ of the truncated Hilbert space 
for the reconstructions. This procedure, also commonly known as state-space 
truncation, depends on the prior information about the unknown state. In our 
case, suppose one knows that the mean number of photons of the source is 
H ~ 4, which is the value assigned in the simulation. Then, one may anticipate 
that all the relevant information about the true state should be contained in 
a Hilbert space of a dimension which is close to p. In fact, it is a common 
practice to choose D sn ^, compatible with this information, such that the dis- 
placed operators form an informationally complete POM. Then, the standard 
ML method can be applied to state estimation. We shall compare the result of 
this approach with another, perhaps more objective, methodology in which we 
select a larger subspace compatible with this prior information and estimate 
the state with MLME. 

To represent the reconstructed statistical operators p su b , the Wigner func- 
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tions W(x,p) of the dimensionless position and momentum quadrature values, 
x and p respectively, are calculated in accordance with^t 



Aiub — 1 Asub — 1 

W(x,p) = 2e-I Q I 2 E HPsubln) 

m=0 n=0 



(_!)«< / _ <l( x + iS gn(n-m)^n > -n <L (n > -n<) ^ | Q |2) 

V 2 n< ?i>. 



(1.4.79) 



where Ln\y) is the degree-re associated Laguerre polynomial (Edmond Nicolas 
Laguerre) in y of order ^ and a = x + ip, for all the statistical operators. Here, 
we define n< = min{m,n} and n> = max{m,n}. 

To quantify the non-classicality of the statistical operators, we make use of 
the concept of non-classicality depth introduced in Ref. [Lee91]. Let us define 
the function 

^(«; T ) = ^ j ( d ^) 2 ex P ( J a/ ^~ w ^ p(w) , (1.4.80) 

where w is a complex variable, P{w) is the Glauber-Sudarshan P function, 
and the parameter r is in the range < r < 1. From the above definition, 
it follows that TZ(a; r) is a continuous interpolating function of r from the 
typically singular, as well as non-positive, P(a/y/2) (r — >• 0), to the Wigner 
function W(a) (r = 1/2), and finally to the positive Husimi Q function 
(Kodi Husimi) Q(a/y/2) = (a/y/2\p\a/y/2) (j — > !)• The non-classicality 



depth is then defined as the smallest value r = f, above which lZ(a; r) > 0. 
Any mixture of coherent states is therefore a classical state since, in this 
case, f = 0. A quantum state with f > is a non-classical state. This 
measure of non-classicality captures the non-classical nature of quantum states 
through a one-parameter family of functions, which can otherwise be invisible 



"Refer to Appendix B for its derivation. 
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(a) True state 
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(b) 5-dimensional ML estimator (c) 11-dimensional MLME 

estimator 

Figure 1.12: Density plots of the Wigner functions, in phase space, of various sta- 
tistical operators for (a) the true state (20-dimensional stationary state of a laser, 
\i = 4) with f s» 0.394, (b) the 5-dimensional ML estimator with f w 0.921 and (c) 
the 11-dimensional MLME estimator with f sa 0.489. Here, brighter regions indicate 
the locations of larger Wigner function values, and vice versa. The statistical oper- 
ator for (b) is obtained using ML by assuming a 5-dimensional subspace in which 
the displaced POM outcomes are informationally complete. The statistical operator 
for (c) is obtained by assuming a larger subspace of dimension 11 using MLME. Nu- 
merous artificial non-classical features of the ML estimator, a signature of its highly 
oscillatory Wigner function, are manifested as an abnormally large value of f , an 
inevitable byproduct of state-space truncation. One can see that with MLME, extra- 
neous artifacts of the Wigner function resulted from such a truncation can be largely 
removed. 
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-5 5 -5 5 

(a) True state (b) 8-dimensional ML estimator 




-5 5 -5 5 

(c) 10-dimensional MLME (d) 15-dimensional MLME 

estimator estimator 



Figure 1.13: Density plots of the Wigner functions, in phase space, of various statisti- 
cal operators for (a) the true state (p a ', at' = 5), (b) the 8-dimensional ML estimator, 
(c) the 10-dimensional and (d) 15-dimensional MLME estimators. In this case, the 
Wigner function of the ML estimator differs greatly from that of the true state, an 
example of misleading information obtained via state-space truncation. A transition 
in the structure of the Wigner function occurs at Z? su b = 10, with the MLME estima- 
tor for -D su b = 15 giving a more accurate estimated picture of the Wigner function of 
the true state. 
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to measures involving a fixed value of r, such as the conventional negativity of 
the Wigner function. This non-classicality depth is but one of a few approaches 
for quantifying the non-classicality of quantum states and we will, without 
fixating on this quantity, adopt it as an appropriate measure that is not worse 
than other proposals. The generalization of Eq. (1.4. TO)** to arbitrary values 
of r, 



.H^ D sub -lD snb -l 



K(x,p;t) = 6 - ^2 ^ HPsub|™} 



m=0 n=0 

n> —n < 



l«l 2 



v^(l-r) J n< \2t(1-t) 

(1.4.81) 

is useful for the numerical computation of f. For the truncated version 



Amb — i 

o sub - 



Y \n)^-(n\ (1.4.82) 

Z-m=0 n! n=0 



of the stationary state in Eq. (1.4.77), taking , Eq. (1.4.81) simplifies to 

K ss (x,p;t) 



' Z^ n =0 n! n=0 v 7 v v J/ 

The performances of both MLME and the standard ML method on the 
true states defined in Eqs. (1.4.77) and (1.4.78) are illustrated by the Wigner 
function plots of the respective statistical operators obtained from both meth- 
ods. These are shown in Figs. 1.12 and 1.13. The respective non-classicality 
depths are also computed for Fig. 1.12. For the state p a /, all the reconstructed 
statistical operators are highly non-classical, with f = 1 [TBS02] for all them. 



H Refer to Appendix C for its derivation. 
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Rather than comparing the f values, the structure of the Wigner functions for 
various reconstruction subspaces will be briefly analyzed instead in Fig. 1.13. 

Light-beam tomography 

Finally, we make use of the MLME algorithm to reconstruct states of clas- 
sical light beams that are measured using the Shack-Hartmann (SH) wave 
front sensor (Roland Shack and Johannes Franz Hartmann). An incoming 
light beam is transformed by a regular array of microlens apertures and de- 
tected in its rear focal plane by a charge-coupled device (CCD) camera (see 
Fig. 1.14). A plane wave traversing in the transverse plane of the SH sensor 
gives rise to a detection, where the individual diffraction patterns are centered 
at the corresponding optical centers of the microlenses. For a distorted wave 
front, the observed diffraction pattern behind the feth microlens aperture will 
be deflected by an angle Since the set of angles Ok is related to the local 
wave front tilts with respect to the transverse plane of the SH sensor, the 
shape of the wave front can be inferred. Clearly, this standard technique of 
wave front reconstruction fails in the presence of imperfect coherence, where 
the notions of "wave front" and "optical phase" are no longer well-defined and 
a more general description of the state of the light beam is necessary. 

Recently, an alternative theory for SH detection, based on the principles 
of quantum state tomography, has been introduced. It was shown that a 
complete characterization of a beam of light is possible from the measurement 
data obtained with the SH sensor under certain assumptions with regards 
to the aperture profiles [HRSS10]. Analogously to quantum states, we can 
describe a coherent beam (mode), with a complex amplitude ip(x), by a ket 
\tp), such that ip(x) = {x\i/j}. 

The transformation of the complex amplitude ip(x) of an incoming light 
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fcth microlens 






diffraction patterns 



Figure 1.14: Schematic diagram of the diffraction patterns of an incoming light beam 
that is obtained from a SH wave front sensor. The light beam is transformed by an 
array of microlenses (apertures). A CCD camera is placed at the rear focal plane of 
the array. The measurement data consist of the measured intensities of the beam. The 
intensity at the jth pixel, located at position Xj, behind the fcth microlens aperture 
is denoted by I)~{xj). 



beam, which is propagating from the fcth microlens aperture to the SH sensor, 
can be described by the linear transformation [Goo05] 




(1.4.84) 



With the identity 




(1.4.85) 



the complex amplitude ipp T o P (x), after propagation, is given by 




(1.4.86) 



where l)k(x — %') = 7prop(<5(ai — a/)) is the impulse response function of the 
fcth microlens aperture, which describes the free propagation of the beam from 
the aperture to the SH sensor. Apart from wave propagation that is energy- 
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conserving, there is an additional effect on the wave amplitude as the light 
beam passes through the microlens aperture that can result in energy attenu- 
ation. This is mathematically described by the multiplicative transformation 
ip(x) —> ak(x)ip(x), where the aperture function ak(x) of the kth. aperture 
gives the resulting aperture effect on the beam profile. Hence, on the focal 
plane of the kth microlens aperture where the SH sensor resides, the final 
complex amplitude ipt(x) of the beam is given by the convolution integral 



Since the detection region of the SH sensor is small, we can compare 
Eq. (1.4.86) with the Fresnel diffraction equation (Augustin-Jean Fresnel) for 
the normalized amplitudes, that is 



where A is the wavelength of the beam, ( = 2ir/\, and irrelevant phase factors 
are neglected, to conclude that the normalized impulse response function can 
be defined as 



Here, the z direction is taken to be the optical axis. It follows that the 
functions \]k{x — y) are orthogonal. That is, 




(1.4.87) 






(1.4.89) 




f«(*~v) 



S(x - y) . 
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More generally, this orthogonality property follows directly from energy con- 
servation of the light field during propagation. By defining /£ rop (x) to be the 
intensity of the propagated beam from the feth aperture, at position x, to be 

•p(x) | 2 to be the initial intensity at the same 



-^fc rOP ( x ) = I ( x )\ 2 an d I(x) 



position before propagation, 



/dx'/rv) 

J dx' \ipi k \x')\ 2 

J dx' J dx" t)* k (x' - x"W{x") j dx'" \) k (x' - x"')i){x"') 
j dx" j dx'"^{x")i,{x'") j dx't)* k (x' -x")t) k (x' -x'") 
f dx" \iP(x")\ 2 
fdx"l { x") 

dx' \)%{x' - x")t) k (x' - x'") = 5{x" - x'") . 



(1.4.91) 



Suppose now, a generic partially coherent beam is detected by the SH sen- 
sor. We can describe the state of such a beam with a coherence operator p co h- 
Using a computational basis of orthonormal modes \ip n ), the D-dimensional 
coherence operator p co h is given by 



Pcoh 



/ „coh /^coh ^ 

r 00 "' POD-I 



i n coh . . . coh 
\rD-10 VD-1D-1/ 



(1.4.92) 



By defining the aperture operator 



(1.4.93) 
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for the kth microlens aperture and the impulse response operator 

H k = J dx" J dx' \x') t) k (x' - x") (x"\ (1.4.94) 

that is unitary from the orthogonality relation in Eq. (1.4.90), the represen- 
tation of the corresponding transformed state p' coh , 

Pcoh = ttk%lk Pcoh Slfeitfe 

= ^il fe 2l fc |^ m )p^(^|2l fe 4 

s v ' v v ' 



E 

mn 



on the focal plane of the apertures follows from the linearity of optics trans- 
formations. The intensity Ik(xj) at position Xj* on the focal plane of the kth 
aperture is 

h(xj) = (xj\p' C oh\ x j) 

= ( x i\ ( l^ m .i) P' mn (^n,fe| J \ x i) (1.4.96) 

^ mn ' 

= E Pmn' l l 3 'm,k( x j) rf'nftfajY > 
mn 

where ip' n k(xj) = {xj\ip' nk ) are the complex amplitudes of the transformed 
light beam obtained from the amplitudes ip n (xj) = (xj\ip n ) of Eq. (1.4.87). 
Since /? co h possesses all the properties of a statistical operator, the MLME 
technique can be used to estimate the true coherence operator p*^ 6 that 
describes a given light beam. To this end, we need to compute the cor- 
responding POM describing the measurement outcomes of the SH sen- 



*In order to talk about a physical position ket \xj), it is important to understand that 
the specification of Xj comes with a certain finite precision. As such, these physical kets 
now normalize to the Kronecker delta, that is {xj\xji) = Sjy. 
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sor. By relating I k {xj) to the corresponding probabilities of the outcomes 

U k{Xj)= 52 m n Wm) U k,nm(xj) ($J, we have 

I k (xj) = tv{p coh U k (xj)} 

coh (1.4.97) 

= / 4 Pmn ^k,nm(Xj) ■ 
mn 

Comparing Eqs. (1.4.96) and (1.4.97), the positive operator describing the 
detection outcome at the jth pixel of the CCD camera behind the fcth aperture 
is given by 

Ti k ,nm{Xj) = ip' mik {xj) tp' n , k (xj)* . (1.4.98) 

In the experiment, a controlled preparation of optical beams is realized us- 
ing the principles of digital holography [HMSW92, BC04]. Figure 1.15 shows 
the set-up. The essence of the beam preparation lies in the numerical construc- 
tion of a digital hologram that is programmed to produce a superposition of a 
reference plane wave and a beam with the true state p*^ 6 of interest. This is 
achieved with the help of an amplitude spatial light modulator (OPTO SLM) 
with a resolution of 1024x768 pixels. The hologram is then illuminated by 
the reference plane wave that is considered in the superposition. To approxi- 
mately produce this plane wave, a collimated Gaussian beam is generated by 
placing the output of a single-mode fiber at the focal plane of a collimating 
lens. In this way, the digital hologram can be fully situated at the center of 
the collimated Gaussian beam of a larger beam waist, where this beam can 
then be approximated to be a plane wave with high accuracy. The resulting 
diffraction spectrum, after illuminating the digital hologram with the colli- 
mated Gaussian beam, involves several diffraction orders, of which only one 
contains useful information about p!~£ e - To filter out the unwanted diffraction 
orders, a 4-/ optical processor, with a small circular aperture stop placed at 
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Figure 1.15: Experimental set-up involving a single-mode fiber (SMF), a spatial light 
modulator (SLM), an aperture stop (A) and a Shack-Hartmann (SH) sensor. 
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the rear focal plane of the second lens, is used for this purpose (the aperture 
stop in Fig. 1.15). The resulting light beam with the state P—if * s then fo- 
cussed at the rear focal plane of the third lens. This completes the preparation 
stage. 

The measurement of the light beam involves a Flexible Optical SH sensor 
with 128 microlenses that form a hexagonal array. Each microlens has a focal 
length of 17.9mm and a hexagonal aperture with a diameter of 0.3mm. The 
signal at the focal plane of the array is detected by a uEye CCD camera that 
has a resolution of 640x480 pixels, with each pixel being 9.9|J.mx9.9um in 
dimensions. 

The aforementioned set-up is used for generating and analyzing low-order 
Laguerre-Gaussian (LG) modes. The LG modes can serve as important re- 
sources in quantum information processing [MVWZ01]. In this experiment, 
only LG modes with no radial nodes are considered. Such modes form a one- 
parameter orthonormal basis, where the modes are specified by the orbital 
angular momentum quantum number I. In polar coordinates, the relevant 
part of the complex amplitude of a LG mode LG;, for a fixed /, is given by 



On the other hand, the orbital angular moment operator L z in the z direction, 
in position representation, is given by 



(s,(p\LG t ) oc s'e^e" 



(1.4.99) 



(x, y\ L z = (x, y\ {X x P y - X y P x ) 




(1.4.100) 



To express the derivatives in Eq. (1.4.100) in terms of polar coordinates, we 
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begin with the parametrization 



x = s cos f , 
y = s sin f . 



(1.4.101) 



In a compact matrix form, the corresponding variations are then given by 




cosy? — ssmcp 
sin if s cos if 




By inverting the matrix equation, we get 

.if 
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\ 

Using the definitions 
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(1.4.102) 



(1.4.103) 



for the total variation of a function / and Eq. (1.4.103), we obtain 
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(1.4.104) 
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Figure 1.16: CCD image for the state p*™- The relevant part of the SH readout 
used for the beam reconstruction is shown. Contributions from the individual SH 
apertures are indicated by bright spots, with each spot made up of multiple pixels. 
Note that the two void regions correspond to the phase singularities of the state p™?. 
This hints that p^ h e « 



Hence, 



h d 

(s, tp\ L z |LG/) = - — ( s , ip\LG t ) 
i dip 

= lH{ a ,(p\LGi) 
= (s, tp\ lh |LG/) 



for all (s, cp\. This shows that |LGj) is an eigenket of L z , implying that each 
photon, prepared in the state |LGj) (LG;|, carries an orbital angular momen- 
tum of lh. 

For the source of light beams, we would like to prepare the state p*^ 6 = 
Pcoh = iVWp) (V'supl, where 



|Vsup) = (|LGo> - |LGi) i - |LG 2 )) ±= , (1.4.105) 



using the OPTO SLM. In the presence of experimental imperfections, however, 
the true state p^otf prepared this way will not be exactly the same as p™h- 
After measuring this beam with the SH sensor, the data are processed using 
the MLME algorithm in Eq. (1.4.62) to obtain the estimator p^ ME for p^ e , 
since G < 1. To quantify the quality of p^ ME , we investigate the fidelity 
between p^ ME and p™£. 
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Figure 1.17: MLME state estimation from informationally incomplete data for D su b = 
9. The real (left) and imaginary (right) parts of the reconstructed coherence operator 
Pcoh ME are shown. The reconstruction subspace is spanned by the modes LG;, with 
I = 0, 1, ... ,8. In this case, 56 out of 91 independent outcomes, required for complete 
characterization of p' 1 ^ , are not accessible, yet the MLME estimator p^ ME is close 
to p'X with a fidelity of 92%. 



Figure 1.16 shows the CCD image for the state Each aperture gives 

rise to a bright spot in the CCD image. To maximize the signal-to-noise ratio, 
only the pixel with the highest intensity within each spot is selected as a mea- 
surement datum. The set of intensities, corresponding to maximum-intensity 
pixels, constitute the measurement data to be used for state reconstruction. In 
our case, the corresponding POM consists of 35 linearly independent outcomes 
described by Eq. (1.4.98). This measurement is, therefore, informationally 
complete for D su b < 5. In cases where state reconstruction on informationally 
complete subspaces gives unsatisfactory results, the MLME approach can be 
used on the informationally incomplete data to give reasonable estimators on 
a larger subspace, as illustrated in Fig. 1.17. 

So far, the procedure of state-space truncation is performed in the basis of 
the LGi modes. In this basis, when p*^ 6 ^ s known to be quite close to p™jj, * ne 
truncation of modes of higher orders will not result in a great loss of recon- 
struction information, as implied by the structure of p^h m ^q- (1-4.105). The 



82 



Chapter 1. Quantum State Estimation 



situation will be very different when there is no such prior knowledge about 
/°coh C > exce Pt for the fact that the possible values of I lie in a certain range. In 
this situation, there is no appropriate strategy to choose a computational ba- 
sis in which the state-space truncation can be done effectively and justifiably. 
More generally, estimating the unknown state p^oif on a t runca ted subspace 
can very often result in missing important reconstruction information and this 
will lead to strongly biased estimators. A remedy for this problem is to per- 
form state reconstruction on a sufficiently large subspace that is compatible 
with the knowledge about the range of values of I. 

To emphasize this point, we simulate the following scenario: 

• The set of measurement data, obtained from the CCD image shown in 
Fig. 1.16, is distributed to 50 parties. The possible values of I for the 
true state p^^ are known to lie in the range I G [0, 7]. 

• Each party selects a computational basis and estimates the state of the 
beam for -D su b = 3, 4, . . . , 8 using either the ML (for D su b < 5) or the 
MLME algorithm (for D suh > 5). 

• The reconstructed estimators for the six values of -D su b are reported by 
each party and the average fidelity of the estimators for every value of 
Z) su b are calculated. 

A typical outcome of this scenario is shown in Fig. 1.18. As can be seen, per- 
forming state-space truncations in order to reconstruct p*ojf with an informa- 
tionally complete set of data generally leads to low fidelities in the estimators. 
Increasing the number of degrees of freedom and using the MLME algorithm 
to cope with the completeness issue seems to be a much better strategy. 
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Figure 1.18: Average fidelities, computed over 50 random choices of computational 
bases, of the estimators for different dimensions -D su b of the reconstruction subspace. 
The unfilled (filled) circular plot markers correspond to informationally complete 
(incomplete) tomography, respectively. 

i- SECTION 1.5 1 

Hedged quantum state estimation — a comparison 

As strongly advocated by Robin Blume-Kohout [BKH06, BKlOb], this method 
has at least two advantages compared to the maximum likelihood estimation 
protocol. Firstly the estimator obtained this way is always full-rank, thereby 
eradicating the problem of zero eigenvalues which are not necessarily justified 
by a finite number of measurement copies. This is because a zero eigenvalue 
corresponds to zero probability for a particular outcome in for instance the 
eigenbasis of the estimator and this requires an extremely high confidence 
which the measured data cannot give. Secondly, the likelihood functional 
C({nj};p) in general has a broad peak over a range of statistical operators. 
By looking at just the peak of the likelihood functional, one eliminates all other 
possible states that are close to the maximum. Therefore it is more reliable 
to take into account all possible states in the vicinity to give an estimator 
that is much less sensitive to slight changes in the measured data than the 
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maximum likelihood estimator. However it is typically hard to evaluate the 
integrals and a systematic way of choosing a suitable prior and the volume 
measure (dp) is unknown [BKlOb]. 

Recently, Robin introduced the hedged likelihood functional [BKlOa] which 
is given by 

Cn({nj};p) = (det P fC({n 3 }; p) , (1.5.1) 

where /3 ~ \. It is analogous to the classical Bayesian method of supplying 
a Dirichlet-type prior probability distribution (Johann Peter Gustav Lejeune 
Dirichlet) and gives the following estimated probabilities 

nj + f3 
Pj ~ N + D/3 

when the measurement operators are now projectors of any complete set of 
orthonormal basis states in the D dimensional Hilbert space. This smooth, 
unitarily-invariant hedging functional (det p)^ was proven to be the unique 
one for carrying out such a transformation. It is shown that maximizing this 
functional will result in an estimator which is always full-rank and therefore 
more compatible with finite number of measurement copies. 

In this last section of Chap. 1, we first review some properties of 
£n({ n j}; P) which were mentioned in [BKlOa] using variational methods in 
§1.5.1. Next we will derive an iterative scheme to maximize Cu({ n j}] p) based 
on the steepest-ascent method in §1.5.2. In §1.5.3, we will discuss informa- 
tionally incomplete measurements and report some interesting features with 
regards to the hedged maximum likelihood estimators. In particular, we first 
prove that given any POM in general, informationally complete or not, the 
estimator that maximizes £-n({rij}; p) (HML estimator) is unique. Next we 
show, by means of qubit tomography simulations, that for some POMs, the 
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HML estimators are actually relatively close to the estimators that maximize 
both the conventional likelihood and von Neumann entropy functionals simul- 
taneously (MLME estimator) even for relatively small N. 



1.5.1 The hedged likelihood functional 

The main objective is to maximize the concave hedged likelihood functional 
Cn({nj}; p) defined as 

£ H ({%};P) = (det pfC({ nj };p) , C{{n 3 }-p) = JJp^ , (1.5.2) 

3 

where the probabilities pj = tr{pIL-}. As always, we can equivalently maxi- 
mize the log- likelihood functional logC}i({nj}; p). Performing a variation on 
log£ H ({%};p), we have 

5 log C u ({nj};p) = ff ^^f + 51og£({ra 3 -};p) 

= tr{((3p- 1 + NR)8p} , (1.5.3) 

where R = ^ - fjUj/pj. To get the second equality for the first term, we 
invoke the identity 

Sdetp = (detp)tr{p _1 5p} . (1.5.4) 

With the usual parametrization presented in Eq. (1.3.3), we obtain the vari- 
ation 

M r x v tr{ [^p- 1 - D) + N(R - 1)] (AUA + 8AU)} 
51og£ H ({n J };p) = tr{.4U} ' (L5 ' 5) 
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By setting 5 log Cn({rij}; p) = 0, we arrive at the extremal equation 

0(1 - D PRML ) + N(R HML ~ 1)phml = , (1.5.6) 

where -Rhml = J2j fj^j/Pj with Pj = ^{PBMiMj}- 

From the extremal equation in (1.5.6), we can recover two properties of 
Phml which were mentioned in [BKlOa]. Assuming now that the POM out- 
comes are projectors of a given set of D orthonormal basis states used to 
represent puml, i-e. 



n i = li) 01 > PHML = \3)Pi 01- 



This set of measurements is not informationally complete since the number 
of measurement outcomes is D, which is less than the minimal number D 2 
required to unambiguously specify a state. Then by direct substitution of 
the forms of II j and /5hml into Eq. (1.5.6), multiplying 11^ on both sides and 
taking the trace, one can obtain the expression for as 



Pk = N + DP ' (L5 - 7) 



which is exactly the "add (3 rule" that assigns a small non-zero probability 
for outcomes with zero occurrence in a finite-sample tomography experiment. 

To show the next property, that is the eigenvalues of /5hml are non-zero 
for any IT,- in general, a transparent approach is to rewrite both Eq. (1.5.6) 
and its corresponding adjoint statement as 



N 

D-j(Rhml-I) 



Prml = Phml 



N 



1. (1.5.8) 
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It is now clear that the extremal equation enforces the existence of the inverse 
of any p H ML, with 



This means that for any non-zero (3, />hml is always full-rank. Therefore the 
peak of Cu({nj}; p) for any given set of fjS always lies inside the admissi- 
ble state space. This is consistent with the fact that the hedged likelihood 
functional goes smoothly to zero on the boundary of the state space. 

It was also reported that for most of the mixed states, taking (3 = 1/2 gives 
optimal estimation results with respect to some distance measures between the 
true state ptvue and phml- For nearly-pure states, a small value of (3 is needed 
to achieve good accuracy since now the true states can have eigenvalues that 
are very close to zero and so large (3 values can result in significant deviations. 
Keeping in mind that pure states are, strictly speaking, a fiction in practical 
state preparation, we will set the (3 = 1/2 in the subsequent analysis. 

1.5.2 The HML algorithm 

A way of searching for the maximum of the hedged likelihood functional is 
to start from an arbitrary state, usually the maximally-mixed state p\ = 
1/D, and ascend in the direction of the steepest gradient. To determine this 
direction, we revisit Eq. (1.5.5) and recognize that the two component gradient 
d log £h is given by 



N 



(-Rhml — 1) ■ 



(1.5.9) 



dlog C K /dA 
d log Cu/dA^ 




d log £ H 
dA 
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d log £h 
&4t 



<91og£ H 



(1.5.10) 



V 



where 



<91og£ H _ A[j3{p- 1 - D) + N{R-l)] 
dA ~ tr{.AU} 



(1.5.11) 
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bZ = 


M 


e 








= 2 





which follows from Eq. (1.3.6). 

In order to ensure that 6 log £h is always positive in the search process, 
we can set the variations 5.4 and bAJ to be proportional to the respective 
derivatives dlogCn/dA and d log Cu/dA^ , the steepest-ascent method. Thus, 
the variation of the two component vector operator Z = (A, A^) T is given by 



(1.5.12) 



for a small e parameter. We thus have a simple iterative scheme (HML) to 
look for the extremal state phml which maximizes the hedged likelihood given 
an initial statistical operator pi, which is given by 

HML iterative equations 

_ [1 + A k ]p k [l + A k ] 
Pk+1 ~ tr{[l + A fc ] Pfc [l + A fc ]} ' (L5 - 13) 

A k = € -[(3( Pk - 1 -D) + N(R k -l)]. 

There exists a slight technical detail in choosing an appropriate e for the 
entire iteration. We note that the ratio A k /e involves the inverse of p k in 
every step and is of the order of N, which can be significantly large as the 
number of detected copies increases. Setting e too large, even to the order of 
1, can result in a rank deficient p k that can produce an indeterminate inverse 
since the iterative equation tends to that of ML for large N. By experience, 
e = 1/N seems to be a wise choice. 



1.5.3 Informationally incomplete measurements 

Typically, we use a set of informationally complete measurement outcomes 
to infer a positive statistical operator which is compatible with the measured 
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data. One can do this by looking for the unique statistical operator which 
maximizes the conventional likelihood functional C({nj}; p). We will therefore 
require at least a minimal set of D 2 linearly independent measurement out- 
comes to obtain a unique estimator. The situation changes when we perform 
informationally incomplete measurements. As discussed previously, MLME is 
one method of obtaining a unique and statistically meaningful estimator out 
of a set of informationally incomplete data. 

An interesting property of the estimator /Shml is that it is always unique 
for any given set of measurement outcomes II j (See Appendix D for a proof). 
This implies that regardless of whether a set of measurement outcomes is 
informationally complete, maximizing the hedged likelihood functional always 
gives a unique estimator. One can understand this intuitively by drawing 
analogy from the information functional X(A; p) discussed in §1.4.5. Then, it is 
convenient to treat the functional (3 log (det p) as an "entropy-like" term much 
like the von Neumann entropy functional — trjplog p} = — log (det {p~~ p })- In 
this sense, the mechanism of HML is rather similar to that of MLME. 

The distance between the HML and MLME estimators, defined by the 
trace-class distance V tT = tr{|pMLME — Phml|}/2, will depend on ptme and 
the POM outcomes lb,-. In fact, there are cases in which /5hml and />mlme 
are close to each other for a fixed set of measurement data. We illustrate this 
point with two examples. In the first example, we consider qubit tomography 
using the trine POM in Eq. (1.4.17). In the second example, we look at two- 
qubit tomography using a POM consisting of the four standard Bell state 
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projectors defined as 
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A relatively small A" = 500 is fixed throughout the simulations. Figure 1.19 
shows the results. 
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Figure 1.19: A numerical comparison between HML and MLME. A total of 500 ran- 
dom true states ptme are generated for each POM. For every true state, a total of 100 
experiments for a fixed N = 500 were simulated and the average trace-class distance 
T>t^ g was plotted. In each plot, for almost all the random states, the estimators /Shml 
(+) and pmlme (D) almost coincide on average. 
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In general, the distance between phml and pmlme for a fixed set of IljS will 
also depend on the number of detection copies N . As N becomes extremely 
large, the two estimators approach each other for some POMs and in this case, 
any of the two methods is fine as far as state estimation with these incomplete 
POMs is concerned. Figure 1.19 shows that in these two examples, even for 
relatively small N, the distance between the two estimators are in general 
quite small. Hence, the performance of HML and MLME can sometimes be 
comparable even for a reasonably small number of detection copies. 

i- SECTION 1.6 1 

Chapter summary 

We have discussed many aspects of quantum state estimation. In introducing 
the idea of informationally complete state estimation, we established several 
maximum-likelihood algorithms using steepest-ascent and conjugate- gradient 
techniques. We showed that the efficiency of the conjugate-gradient algo- 
rithms is generally higher than that of the steepest-ascent algorithm. It must 
be emphasized that the approach to derive the conjugate-gradient maximum- 
likelihood algorithms is naturally extended to all other algorithms that are 
based on the steepest-ascent method. 

Next, we established maximum- likelihood- maximum-entropy algorithms 
to deal with informationally incomplete data and finally applied these al- 
gorithms to three different types of tomography for state reconstruction of 
complex quantum states with infinitely many degrees of freedom. An impor- 
tant lesson that can be learnt from this study is that with a limited set of 
measurement data, reconstructing an unknown quantum state on a heavily 
truncated Hilbert space, in which the measurement data become information- 
ally complete, using the standard maximum-likelihood technique can give rise 
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to extraneous features in the reconstructed states that arise from the state- 
space truncation. One straightforward approach to minimize this problem 
is to apply the maximum-likelihood-maximum-entropy state estimation tech- 
nique on a larger reconstruction subspace that is compatible with any known 
prior information about the quantum state. The choice of the dimension of 
the reconstruction subspace, as well as an appropriate computational basis for 
the truncation, depends very much on the available prior information and is 
sometimes more of an art rather than a science for complex quantum systems. 

Finally, we derived an iterative algorithm, using the steepest-ascent 
method, to maximize the hedged likelihood functional that was proposed as 
a more operational alternative to Bayesian state estimation. We showed that 
the hedged maximum-likelihood estimator obtained is always unique regard- 
less of the informational completeness of the measurement outcomes, unlike a 
conventional maximum-likelihood estimator. We also gave numerical plots to 
show that for some typical single-qubit measurements, the hedged maximum- 
likelihood estimator is very close to the maximum-likelihood-maximum- 
entropy estimator for a given set of measurement data on average even for 
a relatively few number of copies. Hence for practical purposes, one can rely 
on this new state estimation technique to obtain an estimator that is suffi- 
ciently close to the maximum-likelihood-maximum-entropy estimator for some 
measurements. Otherwise, the hedged maximum-likelihood estimator can still 
serve as a convenient estimator for the unknown quantum state. 



Chapter 2 

Two-qubit Entanglement Detection 
with State Estimation 



Entanglement witnesses are Hermitian observables which, when their expec- 
tation values are measured, can indicate if a given unknown quantum state 
is entangled. In this chapter, we discuss another important application, in 
addition to those discussed in Chapter 1, of the MLME numerical schemes to 
bipartite entanglement witness measurement. 

To this end, we first introduce an unprecedented protocol to measure a 
family of a particular kind of entanglement witnesses at one go [ZTE10] in 
§2.1 and §2.2. Such a family of witnesses are known as optimal witnesses 
[LKCHOO]. An entanglement witness is defined as an optimal witness if no 
other witnesses can detect all entangled states detected by this witness, as well 
as other entangled states. Next, in §2.3, we will establish an adaptive strategy 
to measure these families of witnesses in order to improve the efficiency of 
entanglement detection. 

i- SECTION 2.1 1 

Witness bases measurement 



A general i^-partite pure quantum state (describing a composite of K quan- 
tum systems) is defined as an entangled state if its ket cannot be written in 
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the form 



) 



prod 



*l)|^2>.--|^) 



(2.1.1) 



a product or factorizable form. More generally, a K-partite mixed state is 
denned to be an entangled mixed state if it cannot be written in the separable 
form 



where ^ ■ pj = 1 . By defining to be the partial transpose on the Ith subsys- 
tem, from Eq. (2.1.2), it can be readily shown that p^l p > 0. To determine if a 
given unknown state ptruei with a fixed known K, is entangled, one can mea- 
sure the expectation value of a particular kind of Hermitian observable, known 
as entanglement witness, to obtain some information about the existence of 
entanglement. Mathematically, an entanglement witness 2B is a Hermitian 
operator, 2U^ = 2U, with the property that tr{p sep W} > for all separable 
states and tr{/j ent 2U} < for at least one entangled state p ent . Thus, for a 
given unknown state /?true: 

the condition (2)3) = tr{/9 true 2H} < implies that 
ptrue is entangled. However, if tr{/9tmc2n} > 0, no conclusion can be drawn as 
to whether ptrue is entangled or not. Geometrically, measuring the expecta- 
tion value of an entanglement witness introduces a hyperplane that "dissects" 
the Hilbert space, with the side to which ti{p trne W} < containing only 
entangled states. 

For K = 2 [bipartite systems), a Hermitian operator O is decomposable if 
it can be written as 



in terms of the positive operators 0\ and O2. According to Ref. [LKCH00], a 




(2.1.2) 



o = of + 2 



(2.1.3) 
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D-dimensional, bipartite, optimal decomposable witness is denned as 



2H = Q 



T2 



(2.1.4) 



for a given positive operator Q with no product kets in its range. In other 
words, for any D- dimensional ket \x), the resulting non-zero ket Q\x) must 
be entangled. It is clear that tr{p sep 2H} = tr{/? sep Q T2 } = trjpgj LQ} > 0. 
Throughout the analysis, we fix D = 2 2 = 4 for the case of two-qubit quan- 
tum systems. One can easily construct such optimal witnesses from pure 
states, where Q = \*$>) (^|. From the definition given in Eq. (2.1.4), it follows 
that (\&| must be an entangled state. The Schmidt decomposition of its 
corresponding ket 

|tf) = |00)cosa + |11) sine* (2.1.5) 



is useful for subsequent calculations. Evaluating Q 



T2 



Q T 2 = [(|00)cosa + |11) sin q) (cosa(00| + sina (11|)] T2 
= |00) (cosa) 2 (00| + |11) (sina) 2 (11| 

+ 1 01) sin a cos a { 10 1 + 1 10) sin a cos a (01 1 
= |00) (cosa) 2 (00| + |11) (sina) 2 (11| 



+ 



+ 



(|01> + |10» 
(|01> - ]10>) 



1 

71. 
i 

71. 



sm a cos a 



V2 



«oi| + (io|) 



(— sin a cos a) 



V2 



((01|-(10|) 



(2.1.6) 



The important point of this calculation is to realize that for any pure state 
\fy) (^|, the eigenkets of (|^) (^l) 12 are always the same kinds: two product 
kets {|00) , |11)} and two Bell kets { (|01> + 1 10)) /\/2, (|01) - |10)) /\/2}. 
When we measure the projectors |00)(00|, |11)(11|, |^+) (*+| and 
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|^_) (^-1, we in fact measure a family of optimal witnesses at one go. Such 
a progress allows us to search for the "best" entanglement witness out of the 
measured family that has the highest chance of detecting entanglement of 
Ptrue- We start by defining the witness criterion 

min {<(|tf> (^|) T2 )| > (2.1.7) 

which is obeyed by all separable states and is violated for the entangled states 
that are detected by this family of witnesses. The minimization means that 
we are searching for the witness that maximizes the chance of violating the 
inequality ((|^) (^|) T2 ) > in order to detect the presence of entanglement. 
From Eq. (2.1.6), 

min{<(|*> <*|) T2 >} 
= min |/i (cos a) 2 + f% (sin a) 2 + (/3 — f±) sin a cos a j 
= min [ + (k_f^ sm(2a) + (h_h^ cos(2a) | 

' v ' ' 

= |v / (/i-/2) 2 +(/3-/4) 2 sin(2a+tan-i(|l^|)) 
= fA ^-\^(h-f2? + {h-h)\ 

where f\ = (|00) (00|) and = (|H) (H|) are the measured frequencies (or 
expectation values) for the product states, and f'3 = (|vl/+) (^+|) and f± = 
(\*&-) ( V I / - 1) are those for the Bell states. So, the witness criterion now reduces 
to the simple inequality 

4/i/2 > (h ~ hf ■ (2.1.8) 

Thus, once the frequency data are obtained after measurement, the presence 
of entanglement can be detected as long as the witness criterion is violated. 
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The projectors |00) (00|, |11) (11|, |^+) (^+| and (^_| form an or- 

thogonal POM. This basis is known as a witness basis since measuring these 
projectors amounts to measuring the entire one-parameter family of optimal 
witnesses. In practice, it is possible to set up an experiment to measure such 
a family of witnesses using a photon source. Figure 2.1 illustrates such a 
set-up and Table 2.1 explains the measurement outcomes of the set-up in the 
figure [ZTE10]. Another advantage of witness basis measurement is that, 




Figure 2.1: A linear-optics set-up that offers an experimental implementation of 
the optimal witness of Eq. (2.1.4) for polarization qubits. Two photons that are 
indistinguishable by their spatial-spectral properties are simultaneously incident on 
a half-transparent mirror, photon 1 from the left and photon 2 from the right. They 
carry one polarization qubit each, with their unknown two-qubit state to be analyzed. 
After being transmitted through, or reflected off, the half-transparent mirror, the 
photons are detected behind polarizing beam splitters that reflect vertically polarized 
photons and transmit horizontally polarized ones. The four detectors LH, LV, RV, 
and RH must be able to discriminate between one-photon and two-photon events. The 
four eigenstates of the family of entanglement witnesses are distinguished by different 
signatures; see Table 2.1. By letting the photons pass through polarization changing 
wave plates in the input ports, labeled by WPs 1 and WPs 2, one realizes other 
witnesses that differ from the witness of Eq. (2.1.4) by local unitary transformations. 



unlike conventional witness measurement where only the expectation value of 
2H is collected for inference, all frequency data are used to perform quantum 
state estimation to obtain more information about the unknown state. It 
is therefore desirable to measure an informationally complete set of witness 
bases, such that if all the witness bases miss the entanglement detection, a 
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Eigenket 



Counts (LH,LV,RV,RH) 



|vv) 
|hh) 

(|vh) + |hv))/^2 
(|vh) - |HV))/V2 



(0,2,0,0) or (0,0,2,0) 
(2,0,0,0) or (0,0,0,2) 



(1,1,0,0) or (0,0,1,1) 



(1,0,1,0) or (0,1,0,1) 



Table 2.1: Signatures of the relevant projectors in witness basis measurement set-up 
for polarization qubits (0=v, 1=h), detected by the set-up of Fig. 2.1, with no wave 
plates in the input ports. As a consequence of the Hong-Ou-Mandel effect [HOM87] 
(Chung Ki Hong, Zhe-Yu Ou and Leonard Mandel), the cases (1,0,0,1) and (0,1,1,0) 
do not occur. 



full estimation can be performed to identify the unknown quantum state. To 
construct this informationally complete set of bases, we first think of a single 
witness basis measurement as being equivalent to a measurement of multiple 
observables. These observables can be decomposed into linear combinations 
of tensor products of the single-qubit Weyl operators (Hermann Klaus Hugo 
Weyl) since these operators form a complete operator basis. As we are deal- 
ing with two-qubit systems, the corresponding single-qubit Weyl operators are 
defined as 



X = a x 



h)(v| + |v)(h 



Y = a y = |H)i(v| — |v)i(H 



Z = 



o-z = |v)(v| - |h)(h 



(2.1.9) 



in terms of the polarization basis. Since measuring a two-qubit witness basis, 
which comprises four orthogonal projectors, gives only three independent out- 
comes, this means that we obtain expectation values of only three two-qubit 
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observables. These three observables are 

Z X Z 2 = |hh) (hh| + |vv) (vv| - |^ + ) (^> + | - |\&_) , 
Z x \ + IZ 2 = |hh) 2 (hh| + |vv) 2 (vv| , 
X X X 2 + YiY 2 = 2 - 2 . (2.1.10) 

Here, A\A 2 = A A. With this formalism, we are now able to construct 
an informationally complete set of witness bases. By introducing the Clif- 
ford unitary operator (William Kingdon Clifford) C that permutes the Weyl 
operators cyclically, 

CX = YC, CY = ZC, CZ = XC, (2.1.11) 

we can construct an informationally complete set of six witness bases. Ta- 
ble 2.2 lists these six witness bases. Note that one inevitably needs an over- 
complete set since there may exist a repeated observable from a pair of bases. 
More details on the structures of informationally complete sets of two-qubit 
witness bases will be discussed in the next section. The wave plates in the 









Observables 


1 


1 


1 


Zjl + lZa, Z X Z 2 , 


X X X 2 + Y X Y 2 


2 


1 


X 


Z\\ — 1Z 2 , z±z 2 , 


X X X 2 - Y X Y 2 


3 


c* 


c 


X t l + 1Y 2 , XiY 2 , 


Y X Z 2 + Z X X 2 


4 


ct 


xc 


Xxl - IY 2 , X X Y 2 , 


Y\Z 2 — Z\X 2 


5 


c 


ct 


Y x l + lX 2 , Y X X 2 , 


Z X Y 2 + X X Z 2 


6 


c 




Y X 1-1X 2 , Y X X 2 , 


Z{Y 2 — X\Z 2 



Table 2.2: The six witness bases of the kind depicted in Fig. 2.1 that enable full 
tomography of the two-qubit state. The second and third columns list the unitary 
operators C/™ p and C/™ p that describe the effect of the wave plates WPs 1 and WPs 2, 
respectively, on the polarization of the incoming photons. The fourth column states 
the three two-qubit operators whose expectation values are determined when the 
eigenstate basis of the corresponding witness is measured. 
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input ports implement the unitary transformations U™ p and U^ 9 on the po- 
larization of photons 1 and 2, respectively, and so the incoming two-photon 
state p2 P h is transformed in accordance with 

P2 P h -> C/ 2 wp P2ph ([/ 1 wp *7 2 wp ) f (2-1-12) 

before the photons arrive at the half-transparent mirror. In effect, then, the 
family of optimal witnesses of the transformed witness basis is measured rather 
than the original family. The Clifford operator C is implemented by wave 
plates that yield the polarization changes 

|v) -> C|v) = (|v) + |h))/ v / 2, 

|h) -»• C|h) = i(|H> — |v))/V2, (2.1.13) 

possibly accompanied by an irrelevant over-all phase factor. 

i- SECTION 2.2 1 

Properties of two-qubit informationally complete 

witness bases 

We exhaustively list and investigate the set of informationally complete two- 
qubit witness bases that live in the simplest bipartite Hilbert space. Some 
observations are made regarding the structure and unitary equivalences of 
these bases. 

2.2.1 Construction 

We begin by parameterizing an entanglement witness 2H for a two-qubit sys- 
tem with three parameters (u±,U2, a), where a = 0, 1 and Uk = 1, 2, 3, with u\ 
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and U2 labeling the respective unitary Weyl operators U\ and U2 for qubits 1 
and 2. Since we want to search for informationally complete sets of witness 
bases, a good strategy will be to use a complete set of mutually unbiased bases. 
For this, we will consider the (ordered) set of order-2 qubit Weyl operators 
{Z,X,iXZ}. These operators are order-2 since Z 2 = X 2 = (iXZ) 2 = 1. The 
labels u\ and U2 are each defined to refer to one of the three Weyl operators 
in the given order. For instance, u\ = 1 -H- U\ = Z±, u\ = 2 U\ = X\ 
and ui = 3 -H- Ui = \X\Z\ for this set of order-2 qubit Weyl operators that 
refer to qubit 1. The corresponding complementary operators V\ and V2, such 
that UkVk = —VkUk, will each refer to an operator from a list that is a cyclic 
permutation of the Weyl operators given above, that is {X ,iX Z ,Z} . There is 
in principle more than one list of complementary Weyl operators but we shall 
refer to the aforementioned list unless otherwise stated. 

The Schmidt decomposition of a two-qubit pure state is given by 

I }S-./A / - (2.2.1) 

3=0 

The decomposable witness defined as 

ma) = V 2 a (\ )( |) T2 ^ 

can be written as 

1 

j,k=0 

where a cyclic shift, effected by the unitary operator V 2 a , is applied to the 
kets of qubit 2 to account for non-unique orbits of witnesses. We recall that 
any operator can be written as functions of the Weyl operators since these 



(2.2.2) 
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operators are algebraically complete. This means that any such two-qubit 
projector \j, k) (j, k\ is given by 



\j,k) (j,k\ 



1 



i 



t Yl [(-i)-^i] m [(-i)- fe ^ 

m,n=0 



(2.2.4) 



By expressing 2B, given in Eq. (2.2.3), in terms of the Weyl operators and 
picking out the four operators that are measurable in a given two-qubit to- 
mography experiment to be 

\j, k + a) (j, k + a\ + \k, j + a) (k,j + a\ for all j, k (2.2.5) 



and 



\j,k + a)(k,j + a\ + \k,j + a)(j,k + a\ ,j^k, (2.2.6) 



we arrive at the equations 



\j,k + a) (j,k + a\ + \k,j + a) (k,j + a\ 



^\—jm—kn _|_ / j\—km—jn 

m,n=0 

\j,k + a) (k,j + a\ + \k,j + a) {j,k + a 



2 ! 



(2.2.7) 



m,n=0 

(2.2.8 



To extract the relevant independent observables from Eqs. (2.2.7) and 
(2.2.8), we note that 

V k ~ l = Vl = V k (2.2.9) 
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when the V^s are single-qubit unitary operators and 



2ivi (m — n)j 

e b = D5 mn , 



(2.2.10) 



where D = 2 in this case. From Eq. (2.2.7), 



0(m',n';a)= ]T (-!)*»' (-!)*»' 

3,fe=0 



n,n=0 

[(-i) an 'ufuz' + (-l)"™'^'^' 



(2.2.11) 



By looking at different values m! and n', we have 



m = 0, n = 
m = 0, n = 1 or m = 1, n = 

m = 1, n =1 



0(0,0; a) = 4, (2.2.12) 
O(0,l; a) = O(l,0; a) 

= 4[C7 1 +(-l) a «7 a ] , (2.2.13) 

0(1,1; a) = 8(-l) a [/iC/2 , (2.2.14) 



out of which two observables f7i + (— \) a U2, U1U2 can be extracted from 
Eqs. (2.2.13) and (2.2.14) respectively. From Eq. (2.2.8), we consider all the 
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four possible combinations 



l 



j = Q,k 



-¥ 



(-i) an c/f u\ 



n 



(2.2.15) 



m,n=0 



1 



3 =0,k = 1 or j = l,k 



-> ^ [(-l)- m " an + (-l) 



(a+l)n 



n 



m,n=0 



2FiT/ 2 [1 - (-l) a W 2 ] , 



(2.2.16) 



J = 1, A 



1 -> 




(2.2.17) 



from which the only other independent observable that can be extracted is 
V1V2 [1 — (— l) a ?7iC/ 2 ]. It can be verified that the six sets of three independent 
observables listed in Table 2.2 are easily obtained from the three simplified 
observable expressions. 

We need a total of 15 linearly independent observables to perform full 
tomography on a two-qubit state. To search for these sets of informationally 
complete observables, each of a pre-chosen set of 3x3 x2 = 18 observables is 
expressed in terms of the 15 Weyl basis operators X^Zf 1 (8) X| 2 Z| 2 *, where 
Pk and qk each takes the value or 1 and are not simultaneously zero. Next, 
we form a 18 x 15 observable matrix M Q ^ S , with each row representing an 
observable and having phase factor coefficients as matrix entries, to have an 
informationally complete set of witness bases. Thus, for a set of 18 observables 



*The identity operator is excluded. 
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{Ox, O2, ■ ■ ■ , Ois}, the observable matrix M Q b s satisfies the equation 




Mi,i Mi >2 



Mi,is 



*1 



\ 



2 



M 2 ,i M 2 , 2 



M 2 ,i 5 



(2.2.18) 



\o 18 / 



\Mi 8i i M 18)2 • 



Mi 8) i5 / 



\X\Z\X2Z2 j 




The task is then to look for the combination of settings {u\, u 2 , 0} such that 
M bs has 15 non-zero singular values. 

2.2.2 Local unitary equivalence 

There are altogether 3 x 3 x 2 = 18 different combinations of triplets (u\, n 2 , a) 
available to form a set consisting of six distinct triplets. Hence the total 
number of possible sets is (g 8 ) = 18564, which is tractable enough for us to 
perform an exhaustive search for all the full-rank sets^ . Using the list of Weyl 
operators given in the previous section, we find that there are altogether 1395 
sets that are informationally complete after the numerical search. 

These sets are categorized into six classes and within each class, all sets give 
exactly the same set of singular values of M b s - The first class contains only 
three members which are related by the order-3 qubit Clifford transformation 
Ci, defined in Eq. (2.1.11), on the entire reference set. Classes 2 to 6 each 
comprises a number of families of sets, each of which are generated by the local 
unitary transformation effected by the operator V\ on a reference set in the 
family, which amounts to changing the value of a. Some of the witness bases 
in a particular set of six are not affected by the transformation. We call a 

transformation that is effected on n witness bases out of the six in a particular 

tflere, a full-rank set corresponds to an observable matrix Mobs with 15 non-zero singular 
values 
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set to be an n-V\ transformation. Table 2.3 summarizes the results. Another 
symmetry is that these 1395 sets are invariant under a cyclic permutation of 
the list of V\ operators. 



Families 1-Vi 2-Vi 3-Vi 4-Vj 5-Vi 6-Vi Number of sets 

Class 2 

17 4 6 4 1 272 
2 331000 16 

Class 3 

8 4 6 4 1 128 

2 133000 16 

Class 4 

18 2 7 12 7 2 1 576 

Class 5 

18 4 6 4 1 288 

Class 6 

3 15 15 1 96 

Tabic 2.3: The results of local unitary equivalences for sets in Classes 2 to 6. Class 1 
contains only three sets which are mutually related by the qubit Clifford transforma- 
tion that permutes the qubit Weyl operators. The value under the column "1-Vi", 
for instance, gives the number of 1-Vj. transformations that are performed on a fixed 
reference set in each of the families that falls in the class. For example, the first 
row says that for each family out of 17 in Class 2, including the reference set, there 
exists a total of 16 sets with 15 of them generated by applying various types of n-V\ 
transformations on the reference set in the family. Families with the configuration 
(4,6,4,1,0,0), for instance, are due to the fact that two witness bases in the reference 
set of every family, having the same u 1 ,u 2 settings, are unaffected by the V± transfor- 
mations and so there are (?) = 4 1-Vi transformations, (g) = 6 2-Vi transformations, 
(g) = 4 3-Vi transformations and ( 4 ) = 1 4-Vi transformations. Every family in 
Class 4 has half of the 32 sets equivalent to the other half via an overall V\ transfor- 
mation on the entire set. For instance, sets that are generated by the 1-Vi and 5-Vi 
transformations on the reference set in a particular family are related via an overall 
Vi transformation and so on. Half the set generated by the 3-Vi transformations 
on the reference set is equivalent to the other half generated by the same type of 
transformations in the same manner. The entries under the last column includes the 
reference set in each family. There are 1392 informationally complete sets of witness 
bases out of these five classes. Together with the three sets in Class 3, there is a total 
of 1395 sets. 
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2.2.3 A summary 

There exist many full-rank solutions for the two-qubit case and we listed six 
classes of informationally complete sets of witness bases, with all sets giving 
the same singular values of M Q b s within each class. These informationally 
complete sets are invariant under a cyclic permutation of the complementary 
V operators. Finally we mention that the results presented here are valid for 
the list of V operators we used, and that the structures may vary if different 
choices of V operators are taken. For instance, a given V operator remains 
complementary if the operator U is multiplied to it. So there will be two such 
complementary operators for every operator U. Hence, we have a total of eight 
different lists of complementary V operators and every list, in general, gives 
different informationally complete sets and, therefore, different structures. 
The properties of the witness bases for quantum systems of larger dimensions 
are still largely unknown at this point. 

i- SECTION 2.3 1 

Adaptive witness bases measurement with state 

estimation 



We now have all the necessary tools to establish an adaptive scheme to measure 
the witness bases in such a way that the number of witness bases needed to 
detect the entanglement of the unknown state ptruc is optimized. Each time a 
witness basis is measured, a set of four frequencies is obtained and this can be 
used to partially estimate ptme using MLME. Since the MLME estimators are 
generally mixed states, there is a chance that the purity of a MLME estimator 
is lower than that of ptme, especially when ptmc is a nearly- pure state. If the 
measurement of a witness basis detects the entanglement of this estimator, 
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measuring the same witness basis could very likely detect the entanglement 
of 

ptrue- This is due to the trend that entanglement detection becomes more 
difficult as the purity of pt rue decreases. The extreme cases are the maximally- 
entangled Bell states and the separable maximally-mixed state. 

Defining the operators HJ^ and IDp* to be the outcomes of the product 
states, and II^ 1 * and UJ 11 to be those of the maximally-entangled states for a 
given witness basis, an adaptive strategy based on this idea is as follows: 

Adaptive witness bases measurement 

Starting from k = 1 and a witness basis, 

1. Obtain the frequency data by measuring the witness basis; 

• If the witness criterion is violated, escape the loop; 

• Otherwise, proceed to the following steps. 

2. If k > 1, combine these data with the previous ones and renormal- 
ize all the frequencies. 

3. Look for the MLME estimator /Smlme consistent with the total 
collected data. 

4. For each of the 6 — k witness bases left, choose the one which 
gives the minimum value of the function Ap\p2 — (p3 —p^) 2 , where 
Pj = tr jpMLMElIJ 1 * j are the probabilities of the outcomes IIJ 1 * 
from one of the 6 — k witness bases, calculated based on the MLME 
estimator. 



To investigate the performance of this adaptive measurement scheme, we 
perform simulations for both pure and full-rank mixed two-qubit states respec- 
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tively. Figure 2.2a shows the percentage of pure and mixed states detected by 



n 
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Figure 2.2: A simulation on the measurement of the set of six informationally com- 
plete two-qubit entanglement witness bases for 10 4 random two-qubit pure states, as 
well as full-rank mixed states, with the measurements done for one state at a time. 



a specific number of the six witness bases in a particular ordering by violating 
the witness criterion in Eq. (2.1.8). Figure 2.2b shows the plot generated using 
the adaptive strategy for choosing the subsequent witness basis based on the 
MLME estimator obtained using the accumulated measurement data. Doing 
so will reduce the mean number of witnesses required to detect entanglement 
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for a given pure state. Note that about 2% of the random pure states and 
about 67% of the random mixed states are undetected by the six witnesses 
without performing full tomography with the aforementioned strategies. 

The number of quantum states that are not detected by all the witness 
bases can be further reduced (Fig. 2.2c) by performing one additional step 
to check if there are separable states in the ML convex set produced by the 
accumulated incomplete measurement data after the witness criterion is not 
violated. The entanglement of a quantum state is considered to be detected 
when no separable states are present in the convex set, since subsequent wit- 
ness basis measurements ultimately reduce the size of the convex set to a 
single estimator — the true state ptme for large N — that was previously 
inside this larger set. 

To perform this search, we maximize the likelihood functional over the 
space of separable states and compare this maximum value with that obtained 
by maximizing the same functional over all states. If the former is lower than 
the latter, this means that the true ML estimators in the convex set cannot 
be separable. 

The iterative algorithm for the maximization is in fact very similar to that 
established in Ref. [RH03]. Without going through the derivation, we present 
the algorithm below: 



2.3. Adaptive witness bases measurement with state estimation 



111 



ML over the space of separable states 

Starting from k = 1, a fixed small parameter e and a separable state 



>16 



where tr{pi} = 1 and the randomly chosen kets 
subnormalized, 

1. Compute Rk as in Eq. (1.3.2); 

• Escape from loop if 



(p 2 I ) are 
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2 >16 
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• Otherwise, proceed to following steps 
Compute the new operators 
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2. Set k = k + 1 and repeat the iteration from the beginning. 
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With this additional step, the percentage of undetected pure states is 
reduced to practically zero (0.01%) and one needs no more than five witness 
bases to detect entanglement for the rest of the pure states. The improvement 
is even more dramatic for the mixed states, with a reduction from about 
67% to about 2.7%. The mean number of witness bases needed to detect 
entanglement for mixed states is higher than that for pure states. This is 
not surprising, since mixed states generally have lower entanglement and are, 
therefore, harder to detect. Also, the mixed states are more likely to be 
separable than the pure states. 



Chapter 3 



Quantum Process Estimation 



i- SECTION 3.1 1 

Introduction 

Quantum process tomography (QPT) is an important tool to characterize the 
operation of a given quantum channel* [MRL08, OPG + 04, PCZ97]. Such a 
characterization is needed, for example, when one attempts to construct a 
quantum channel comprising multiple logic gates, each carrying out a specific 
quantum process. One such quantum channel for entanglement distillation, 
for instance, would consist of controlled not cnot gates. A physical quantum 
process is described by a completely-positive map Ai. That is, given a partic- 
ular input quantum state p\ residing in the Dj-dimensional Hilbert space T~L, 
the resulting output state p Q in the D -dimensional Hilbert space /C is given 
by 

p = M( Pi ) = J2 K mPiKl, (3.1.1) 

m 

with the Kraus operators (Karl Kraus) K m satisfying the relation 
Em^" 1 ^-™ = ^-K.- The K m s are not unique and any other set of Kraus 
operators 

K 'm = y^mm'^W , (3.1.2) 
III' 

where the u mm is are the elements of a unitary matrix, also parameterizes the 
completely-positive map M. [NCOO]. 

*The words "quantum process" and "quantum channel" will be used interchangeably. 
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The idea behind QPT is to estimate such completely-positive maps with 
measurements. Much like quantum state tomography, the estimation of an un- 
known quantum process can be perceived as the estimation of a positive Choi- 
Jamiolkowski operator (Man-Duen Choi and Andrzej Edmund Jamiolkowski) 
-Etrue that is represented by a D\D x T)\D matrix [Cho75, Jam72]. Such 
an operator contains all accessible information about the quantum process. 
The standard QPT procedure involves the measurement of multiple copies 
of L different output states, with each output state corresponding to one of 
the L linearly independent input states p\ , thereby using a POM of, say, M 
outcomes. The unknown operator Etme is estimated by linear- inversion of the 
LM measurement frequencies, which consists of DfD^ linearly independent 
constraints. Like the linear-inversion procedure for quantum state estima- 
tion, the resulting estimator obtained may not be positive. If that is the case, 
the estimator cannot be used for statistical predictions. This failure occurs 
whenever the observed relative frequencies of the measurement outcomes do 
not have consistent interpretation as probabilities. What is, therefore, called 
for, is an estimation procedure that ensures a physically meaningful estimator 
whatever the measurement data may be. 

One statistically meaningful technique to obtain a positive estimator for 
-Etrue is the maximum-likelihood estimation procedure [PR04]. This can be 
applied to yield a unique estimator Eml as long as the measurement data 
obtained form a set of DfD^ linearly independent constraints. We say that 
this set of measurement data is informationally complete. However, the num- 
ber of linearly independent parameters increases rapidly with the dimensions 
and a complete characterization of -Etruc becomes unfeasible for complex pro- 
cesses. This is especially true when the quantum process acts on an infinite- 
dimensional Hilbert space [RKSM + 11]. The well-known method of Direct 
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Characterization of Quantum Dynamics (DCQD) [MRL08] was introduced to 
reduce the amount of measurement resources (the total number LN of copies 
measured) that are used for quantum process tomography. However, this 
method requires entangled input states and post-processing strategies that 
can be expensive when dealing with more complex quantum processes. 

A more straightforward and conceptually different approach is to resort 
to informationally incomplete QPT. With this approach, less measurement 
resources are used to obtain an estimator for the unknown quantum process 
to a fair amount of accuracy. As a consequence, there exists a convex set 
of infinitely many ML estimators which are consistent with the measurement 
data. To choose the estimator which is least-biased from the convex set, we 
invoke the maximum-entropy principle [Jay57a, Jay57b] and choose the esti- 
mator with the largest entropy. Such an incomplete QPT can also give useful 
information about the quantum channel. In a typical tomography experiment, 
with data from measuring a finite number of copies, the resulting quantum 
process estimator can never be exactly equal to E tTne since experimental fluc- 
tuations are inevitable. One can only obtain an estimator that is close to 
-Etme within a certain tomographic precision. Thus, MLME QPT is typi- 
cally useful in providing a unique estimator for an unknown quantum process 
within a suitable tomographic precision using fewer incomplete measurement 
resources. As will be shown, this reduction in measurement resources is more 
pronounced for unitary quantum channels. Since -Etrue is unknown, one com- 
mon practice is to gauge such a tomographic precision with another operator 
-Eprior that is close to Etraej based on some prior information one has about 
the constructed quantum channel. The availability of such a -E pr ior for a given 
Etvue will become useful and important in subsequent discussions. 

The estimators obtained using the aforementioned method are least-biased 
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with respect to the set of incomplete measurement data in the sense of the 
entropy of the quantum process. In Ref . [Zim08] , which is an analytical study 
of the conventional maximum-entropy method, the entropy functional for the 
Choi-Jamiolkowski operator E describing a quantum channel was introduced 
as S (E) = —ti{(E/Di)log(E/Di)} and this was shown to exhibit nice prop- 
erties. In particular, this concave channel entropy functional has a unique 
maximum in E and is zero only when the quantum channel is unitary since 
E/D\ is then a rank-1 projector. However, the analytical results in [Zim08] 
apply only to simple qubit channels and are difficult to extend to general 
quantum channels of greater complexity. We shall extend the strategy in 
§1.4.5 and establish adaptive iterative algorithms [TERH11] to search for the 
MLME estimator -Emlme which maximizes both the likelihood and entropy 
functionals using the channel entropy functional in [Zim08]. 

We first give some preliminary ideas on quantum process estimation in 
§3.2. Then, in §3.3, we will present the iterative MLME algorithm using 
variational principles to derive a steepest-ascent scheme and apply it to nu- 
merical simulations of two-qubit and three-qubit quantum channels. In §3.4, 
we will establish adaptive strategies to apply the MLME algorithm with the 
aim of minimizing the amount of measurement resources needed to perform 
incomplete QPT. 

i- SECTION 3.2 1 

Preliminaries of quantum process estimation 

The estimation of the completely-positive map M that describes an unknown 
quantum process, in the manner presented in Eq. (3.1.1), is isomorphic to the 
estimation of an unknown quantum state. This is a consequence of the well- 
known Choi-Jamiolkowski isomorphism [Cho75, Jam72, PR04]. Let us define 
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a maximally-entangled pure state = Ylj \ j)n ® \j)w /v'A m terms of 

the computational basis kets \j)^ <8> Here, the dimensions of the Hilbert 

spaces % and %' are both equal to the dimension D\ of the input Hilbert 
space. Using this basis, there exists a one-to-one correspondence between the 
map M and a unique positive operator E defined as follows: 

E =Di {X H ® £ w ) (|* + ) 

= X) (li> <*l) ® A< (li> <*|) , (3-2.1) 

with Z% being the identity map. From Eq. (3.1.1), the alternative expression 

E = J2\^m)(^m\ , (3.2.2) 
m 

with 

IVU = {l H ®K m )\^+)^/D i , (3.2.3) 

implies that the rank of E is equal to the number of linearly independent 
K m s. It follows that E is rank-1 if the completely-positive map is described 
by a single unitary Kraus operator, and only then. 

The output state can be expressed in terms of E by means of 

Po = tT n {E(p?®l K )} , (3.2.4) 

where the transposition is defined with respect to the computational basis. 
Hence, reconstructing the quantum process amounts to estimating the positive 
operator E. To do so, one requires a total of DfD^ real parameters to specify 
the corresponding matrix. In the subsequent analyses, we shall consider trace- 
preserving maps, that is trjp;} = tr{/9 } for any p;, in which case the number 
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of independent parameters is reduced to Df(D^ — 1), with the constraints 
compactly written as 

tr K {E} = 1 H . (3.2.5) 



To estimate E, typically a set of L input states , with TV copies each, are 
sent through the quantum channel, one state at a time. The output state p^ 
that corresponds to /Oj is measured with a POM consisting of M outcomes 
n m > such that ^ m II m = ljc- The probability of getting outcome II m 
for the input state is given by pi m = tr$E ^pf^ T <8> II TO ^ \ I L. Here, 

P'l = Y.mPlm = 1/L. 

If the LM parameters comprise DfD^ linearly independent ones, the mea- 
surement data will be informationally complete. One can thus perform a com- 
plete quantum process estimation using the maximum-likelihood (ML) algo- 
rithm [PR04] and so obtain a unique positive estimator Eml by maximizing 
the likelihood functional 

L / M \ 

1=1 \m=l J 

where the number of occurrences n\ m for the outcome II m obtained in an 
experiment with the input state pf^ are such that n\ = ni m = N . 

SECTION 3.3 



The iterative algorithm 



We consider the optimization of the information functional 



X(A; E) = XS(E) + ^ log £({n lm }; E) , (3.3.1) 
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where A is a parameter which scales the entropy relative to the normalized 
log-likelihood and should be chosen with a very small value. When the mea- 
surement data are informationally complete, one sets A to zero and optimizing 
I(X = 0;E) amounts to the ML problem [PR04, FH01]. In the same spirit 
as in §1.4.5, both our knowledge from the measurement data (contained in 
log (£({ni m }; E)) /LN which measures the information gain) and our igno- 
rance (reflected in S(E) which measures the lack of information) about the 
operator E are taken into account in such a way that our ignorance takes an in- 
finitesimal weight. This introduces a small and smooth convex hill over the set 
of positive ML estimators which selects the one with the largest entropy. As in 
[TZE+11], the value of A may be chosen such that both log (£({n; m }; E)) /LN 
and S(E) remain almost constant with respect to A. 

To maximize X(A; E) with respect to E, we define the variation E + 8E = 
(1 + Z^)E(1 + Z), where Z is a small arbitrary operator such that Eq. (3.2.5) 
is satisfied, that is: t?{bE} = 0. Thus the most general expression for Z is 



with an unrestricted infinitesimal 6*4. On the other hand, the variation of 
I(X;E) with respect to E gives tr{6£'Vy}, where 



and fi m = ni m /LN. Since Z is small, the operator 1 + Z can be expressed as 



in terms of the first-order variations 6 A and bA^ . In deriving the expression 



1 + z = (1 + 6.4) 




(3.3.2) 




(3.3.3) 




(3.3.4) 
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above, the approximation 



(l + 0)-3 (3.3.5) 



for a small operator (f> is used. The variation bE is thus evaluated as 

bE = {l + Z^)E{1 + Z) - E 

= (SA* - * tr K {&A*E + EbA} ® lA E 

+ E (bA - X - \.y k \bA^E + EbA} ® 1^ • (3.3.6) 



Hence 



5Z(A; £) 
= tr{6£ W} 

= tr{bA^EW - - (tv K {bA j E} ® 1 K ) £iy 



^ £ (tr* {s^t^J ®1^ + h.c.} 
= tr!^bA j E (^W - ^tr K {WE + EW} ® + c.c. . (3.3.7) 

By imposing 6Z(A; E 1 ) > 0, the method of steepest ascent leads us to 

bA = bA^= e -(w- 1 - tr,e {VFE + EW} ® 1^ (3.3.8) 

for some small e > 0. Hence, to obtain the MLME estimator Emlme, one 
simply fixes A <C 1 and iterates the equations 
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MLME QPT iterative equations 

l + Z n = (S4)„ + 1 - - tr K { (5.4)* E n + E n (54) n } ® 1 K 
(64) n = | \W n - i trjc {VF n £ n + £„W n } ® 1^ , (3.3.9) 

where the expression for Z n follows from Eq. (3.3.2) and W n denotes the 
operator W in Eq. (3.3.3) evaluated for E n . One may do so by starting from 
a randomly chosen operator Eq and continue until the extremal equation for 
-E-mlme is satisfied with some pre-chosen numerical precision. To derive this 
extremal equation, we define the Lagrange functional [PR04] 

V(E) =1(\;E) -tr{ A E} (3.3.10) 

with the Lagrange operator A = h ® 1/c for the constraints in Eq. (3.2.5), 
where h is a Hermitian operator. Setting the variation of T>(E) to zero gives 
the extremal equation 

A-EmlmeA = Wmlme-^mlmeWmlme (3.3.11) 

with A = Jtrx I^mlme-EmlmeWmlme} ® 1/c- 

Thus far, we have been assuming that the measurement outcomes H m give 
perfect detection of quantum systems. The iterative equations in Eq. (3.3.9) 
can be generalized to the case of imperfect detection. As always, if each of the 
M measurement outcomes II m is assigned a detection efficiency r) m < 1, one 
can define a new set of M measurement outcomes U m = r] m Tl m such that G = 
Ylm Am 7^ ljc- It follows that the probabilities pi m = tr^E (pf^ r fi m ^ j j L 
do not sum to unity. 
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The likelihood functional, in this case, turns out to be 



(3.3.12) 

where ^ hi = N truc is the unknown total number of copies and the primed 
quantities are defined as in §3.2. Stirling's formula then gives 

log £'({n lm }; E) « AWe (log N tme - 1) - ^ [h - n[) (log (hi - n[) - l) 

+ XI (" z ~ re log ( x ~ p ' 1 ) + ^ nim logpim + const- • 

I ^ ' lm 

(3.3.13) 

The corresponding derivative 

6 J^^m = log ( »-u-ti) (3 .3. 14) 

bni y hi-n l J 

is zero for the most-likely n;, which is given by 

- -v' 

n, = n{ + JV^-£. (3.3.15) 

Hence, 



lm — ' 1 \ 



In short, the iteration procedure of Eq. (3.3.9) can still be used with the new 
set of POM outcomes IT m provided that the operator W in Eq. (3.3.3) is 
replaced by W — Wo, where 
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accounts for the copies that escape detection. 

As an example, we apply the algorithm to numerical simulations on two- 
qubit channels, the CNOT gate described by the unitary operator 



I 1 

10 

1 

y o o i o 



\ 



(3.3.18) 



and a randomly generated non-unitary quantum channel described by a full- 
rank Choi-Jamiolkowski matrix, as well as the three-qubit Toffoli gate de- 
scribed by the unitary operator 



^10000000^ 



01000000 
1 
1 
1 
1 
00000001 
1 



Toffoli : 



(3.3.19) 



To quantify the discrepancy between an MLME estimator and the true Choi- 
Jamiolkowski operator -Etrue) we use the trace-class distance 

Ar (-EmLME, £true) = ^-^tr j | i^MLME — ^true|| • (3.3.20) 

In these simulations, we take the Dj-dimensional projectors of a SIC POM as 
the input states. As shown in Fig. 3.1, using the MLME algorithm for QPT 
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Figure 3.1: Numerical simulations on the two-qubit (d = 2 2 ) and three-qubit (d = 2 3 ) 
quantum channels where Z?j = D Q = d. The projectors of symmetric informationally 
complete POMs (SIC POMs) are chosen as the linearly independent input states for 
all the simulations (L = d 2 ). For the measurements, informationally complete POMs 
consisting of tensor products of qubit SIC POMs are used (M = d 2 ). Each qubit SIC 
POM consists of a set of pure states whose Bloch vectors form the "legs of a tetrahe- 
dron" in the Bloch sphere. For the two-qubit channels, N = 10 4 and an average over 
50 experiments is taken to compute the trace-class distances. For the three-qubit 
channel, the measurement data are generated without statistical noise. For unitary 
channels, one can see that the MLME algorithm can still give fairly accurate esti- 
mations with a smaller number of input states than that of a linearly independent 
set. Numerical simulations of arbitrary two-qubit and three-qubit unitary channels 
suggest that the number is approximately d 2 /2 for SIC POM input states, above 
which there is insignificant tomographic improvement. 
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can give fast convergence in terms of tomographic efficiency with a reduced 
number of input states as quantum resources. This reduction is especially 
significant for unitary processes, where the Choi-Jamiolkowski operators are 
rank-1. For nonunitary quantum processes described by matrices of larger 
rank, the tomographic efficiency will be lower as shown in the first plot of 
Fig. 3.1. This is expected in analogy with quantum state tomography where 
it is more difficult to reliably estimate highly-mixed states than nearly-pure 
ones. 

r- SECTION 3.4 1 

Adaptive strategies 

An interesting question to ask with regard to incomplete QPT is whether one 
can perform it in an optimal way given the available resources by means of 
adaptive strategies. Here optimality refers to the minimization of the amount 
of resources (input states or measurements) used to perform incomplete QPT 
such that the distance between -Emlme and E tiue reaches a certain desired 
value. Very frequently, despite the fact that E tTue is always unknown, one has 
a rough idea of an operator -Bprior which may be close to E true based on some 
prior information about the unknown -Etme- This scenario is reasonable and 
typical when one designs a quantum channel experimentally which performs 
an expected quantum operation, with errors arising from imperfections of the 
components that make up the channel. We shall establish adaptive strategies 
which make use of such an operator in order to select, with the help of the 
MLME algorithm, resources for incomplete QPT in an optimal way. We 
refer to such tomography schemes as the adaptive MLME quantum process 
tomography (AMLME QPT) schemes. 

We will focus on adaptive strategies to choose the input states optimally. 



126 



Chapter 3. Quantum Process Estimation 



This can be reviewed in two separate cases: The case in which a fixed set of 
linearly independent input states is used (§3.4.1) and that in which arbitrary 
input states can be generated for incomplete QPT (§3.4.2). Adaptive strate- 
gies to choose the POM are relatively harder to formulate and this task is put 
aside for future studies. 

3.4.1 Optimization over a fixed set of linearly independent 
input states 

In the previous section, we considered the projectors of the SIC POMs, which 
are known to have optimal tomographic efficiencies, as input states in the 
numerical simulations. Since these POMs are symmetric in the sense of 
Eq. (1.2.21), any ordering of the input states in a given set gives the same 
plots in Fig. 3.1. In practice, however, such entangled states are difficult to 
produce and one typically has access to a set of separable states [RKS + 06] for 
measurements instead. In this case, there no longer exists such a symmetry 
and the tomographic performance depends on the order of the input states 
chosen, possibly strongly so. We propose to optimize the tomographic per- 
formance by choosing the input states adaptively based on the measurement 
data collected from the previously chosen input states, thereby using the prior 

-Eprior • 

To describe the adaptive strategy, let us consider a set of L > Df input 
states in which Df of them are linearly independent. Suppose that N, which is 
a fixed integer for all input states, copies of a randomly chosen input state p[ 
are sent through the quantum channel and the first set of measurement data 
{^li, . . . , vim}i Ylm u ^m = 1) i s collected. With these data f\ m = fi m , one 
obtains the first MLME estimator -E^lme - To select the next input state out 
of the remaining L — 1 states, we take -E pr ior as a gauge for .Etrue to generate 
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L — 1 sets of probabilities respectively from the L — 1 states. Each set of 
probabilities is then treated as the set of frequencies ■ ■ ■ i ^2m)' f° r the 

corresponding input state k. Hence, one has L — 1 sets of measurement data, 
each set being the combined data {i>n, . . . , v\m, v t\ ■ ■ ■ > 4mI/^ w hh the 
normalized frequencies /i m = vi m /2 and = such that ^ m (/i m + 

/im) = 1 f° r eac h fe> an d the corresponding L — 1 projected MLME estimators 

£(2) 

MLME, A; ' 

The value of k is selected such that a chosen figure of merit which quantifies 
the distance between E^ hME k and -E^ LME is the largest, so that there is a high 
chance for the next MLME estimator to be closer to -Etrue- As an example, the 
figure of merit is taken to be the trace-class distance T> tr ^mlme fc> -^mlme) ■ 
With this input state, the second estimator i%LME ^ s then obtained with 
MLME QPT. One repeats this procedure for subsequent input states until the 
distance T> tT ^mlme' -^mlme) * s below some preset threshold. An alternative 
to this would be to minimize the trace-class distance V tr ^mlme fc' -^prior^ • 

It is important to understand that in this strategy, the prior information 
-Eprior is not used to reconstruct the unknown quantum process in any way. 
It serves only as a means to optimally select the input states from the given 
set so as to maximize the tomographic convergence. This adaptive strategy 
also relies partially on the measurement data obtained in the experiment. We 
have thus introduced an operational method of using the prior information 
to minimize the amount of resources needed to perform reliable MLME QPT 
without introducing any artifacts coming from the prior information into the 
reconstruction procedure. To summarize, the adaptive MLME strategy is as 
follows: 
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Adaptive MLME algorithm (Fixed set of input states) 

1. Randomly choose p[^ from the set of L input states and set I = 1. 

(a) Perform QPT using pf^ and obtain the set of frequencies 
{yi\i ■ ■ ■ , Vim}, Em V lm = 1- 

(b) Set v = \J j=1 {vj\, • • • , vjm}/ I- 

(c) Invoke the MLME algorithm with v and obtain i^LME- 
Use -Eprior to compute the frequencies {^S l5 • • • , M }, 
Ylim u i+im = ^' f rom the remaining input states, with k la- 
beling the remaining L — I states. 

(d) Define L — I sets of accumulated frequencies 
(y U {y\+i i, • • • , ^+im})/(^ + 1) an d calculate the L — I 
projected MLME estimators -E-MLMEfc- 

(e) Set pf +1 ^ as the input state corresponding to k such that 
V tr (^Slme^'^mlme) is largest. 

2. Set 1 = 1 + 1 and repeat Steps l(a)-l(e). 



3.4.2 Optimization over the Hilbert space 

More generally, the adaptive strategy may be extended to the case in which 
one has access to the entire Hilbert space of states. In other words, the task 
is to search for the next optimal input state p[ L+1 ^ from the A-dimensional 
Hilbert space based on the measurement data v\ m obtained in the experiment 
from L previously chosen input states, where Ylm^im = 1 f° r an an d the 
prior information -Bprior about the unknown quantum process. 
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To this end, we define the normalized projected log-likelihood functional 



logl({u lm };E,p) = J2^fl^g{p lm ) + J2TTl logiPm) ' (3A1) 

lm m 



where 



Pirn = tr< E L + 1 — >, (3.4.2) 

v m = tr{E prior p ® II m j and p m = tr<^ E ^ ) (3.4.3) 

with I always running from 1 to L over all previously used input state labels. 

This projected log-likelihood functional is a good approximation to the 
log-likelihood functional for the situation in which the state p is chosen as 
the next input state for the experiment as long as -E pr ior is not too far away 
from Strue- The projected frequencies v m estimate the actual frequencies 
one gets when measuring the input state p. An optimal input state p[ L+1 ^ 
and the corresponding Choi-Jamiolkowski operator are chosen as the positive 
estimators that maximize this projected log-likelihood functional. 

Coincidentally, this maximum projected log-likelihood (MPL) proce- 
dure is equivalent to minimizing the cross entropy functional C(E,p) = 
— log£({fj m }; E, p) [JM09, EFS05] over all positive operators subjected to 
the respective constraints for p and E. Hence, another way of understanding 
this procedure is to first regard both the incomplete measurement data col- 
lected after using L input states and -E pr ior as the full prior knowledge one has 
about the unknown E^ TUe . The statistical motivation for MPL or minimiz- 
ing C(E,p) is, loosely speaking, to obtain estimators which are as compatible 
with this prior knowledge as possible by minimizing the entropy of the prior 
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knowledge C(E,p). We will provide some more arguments related to this 
optimization technique in the later part of this section. 

To carry out the optimization, we consider the response of 
log C{{vim}; E, p) to small variations of both p and E. After some similar 
calculations as in §3.3, we obtain the MPL iterative equations 



(1 + K(i + 



Pn+1 



tr w {(l + e 2 E n )p n (l + e 2 H n )} 



(3.4.4) 



where Z n is defined by Eq. (3.3.2) with 



(8A) n = y (x n - -tr K {X n E n + E n X n } ® 1 K 



X n 



Irn 



(l)T 



II, 



p lm {L + lf 



V m P 



Pm (L+l) 2 ' 



(3.4.5) 



and 



y n = tr/c < 



E in g n r , 
L + l 



x log (jj m ) E piior + 



-E 



(3.4.6) 



The MPL estimators satisfy the extremal equations 



A-EmplA = ^mpl-Empl^mpl , 

/SMPL34lPL = JWlPMPL = tr^jO^lPLPMPlJPMPL 



(3.4.7) 
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where 

A = ^t^K {^mpl-Empl^mpl} ® Ik ■ (3.4.8) 

The small parameters e\ and £2 are positive numbers. Thus, to carry out the 
MPL procedure, one iterates Eqs. (3.4.4) until Eqs. (3.4.7) are satisfied with 
a preset numerical precision. 

There is one important feature of this optimization scheme. From 
Eq. (3.4.1), we note that log£({f; m }; E, p) is neither convex nor concave in 
p and hence can have multiple local maxima. Thus, the MPL optimization is 
nonconvex. 

To generate these local-maxima estimators, one can start from multiple 
randomly chosen starting points and perform the iterations. Thereafter, the 
state estimator /5mpl to be chosen as the next input state p[ L+1 ^ is such that 
its corresponding -Bmpl gives the largest trace-class distance away from the 
previous MLME estimator -Emlme> WFi ich is obtained from the data of the 
previously chosen L input states, over all generated pairs of MPL estimators 
(PMPLi -E-mpl)- Again, one may also minimize the distance between Empl and 

-^prior • 

Let us summarize the adaptive MPL-MLME strategy with the following 
scheme: 
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Adaptive MPL-MLME algorithm 

1. Randomly choose pj as the first input state and set / = 1. 

(a) Perform QPT using pf^ and obtain the set of frequencies 
{m, ■■■ , vim}, Em u im = 1- 

(b) Set v = \J j=1 {vji, v jM }/ 1- 

(c) Invoke the MLME algorithm with v and obtain -E-^lme- 

(d) Using -Eprior, generate a set of pairs of MPL estimators 
(/OMPL) -Empl)) where the states /5mpl were not part of the 
I input states previously used, by iterating Eqs. (3.4.7) from 
different, randomly chosen starting points. 

(e) Set Pj as the input state corresponding to the state esti- 
mator /5mpl such that V tr [EmpLi -^mlme) ^ s largest. 



2. Set 1 = 1 + 1 and repeat Steps l(a)-l(e). 



With this, let us first compare the performances of the three proposed 
schemes, namely the non-adaptive MLME scheme in §3.3, the adaptive MLME 
scheme in §3.4.1 and the adaptive MPL-MLME scheme. For this purpose, we 
consider two quantum processes, the first being an imperfect CNOT gate whose 
action is described by the Kraus operators 

K x = Vl-eC/cNOT and K 2 = Jl. (3.4.9) 

This first channel is a CNOT gate with probability 1 — e and does nothing to 
the input states with probability e, an imperfect CNOT gate represented by a 
rank-2 Choi-Jamiolkowski operator. The second process is described by the 
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Kraus operators 

K ± = VT^~eU CN0T and {Kj = yfiBj})i 2 , (3.4.10) 

where the 15 operators Bj are randomly generated and satisfy the equation 
^2jBjBj = lye- This second channel, which is represented by a full-rank 
matrix, is a CNOT gate with probability 1 — e and randomly perturbs the 
input states with probability e with additional noise. As an example, we set 
e = 0.1. Figure 3.2^ shows the numerical results. 

Next, to understand how this adaptive MPL-MLME strategy can lead to 
an optimization in tomographic performance, we need to know how increasing 
the number of input states used in AMLME QPT can affect the corresponding 
MLME estimators. Since we are considering only a subset of the full linearly 
independent input states in general, there exists a convex set of estimators 
Eml maximizing the likelihood functional for a given set of informationally 
incomplete measurement data. This means that the likelihood functional pos- 
sesses a plateau hovering over this convex set of estimators. As the number 
of input states L used increases, the likelihood plateau will either remain un- 
changed (if no additional information about E trvLe is gained after performing 
QPT with new input states) or decrease in size (if new independent informa- 
tion is obtained). Thus in general, the plateau will continue to shrink to a 
point when a full set of linearly independent input states is used. 

We conjecture that the adaptive MPL-MLME strategy optimizes the rate 
of decrease in the size of the likelihood plateau by maximizing the normalized 

^The set of input states used in Fig. 3.2, taken from Ref. [RKS + 06], is just one of the 
many possible choices one can use in quantum process tomography. It is important to 
understand that this set is by no means sanctioned to be the "standard" set of input states. 
Rather, these are four states of the six projectors of the standard six-outcome POM, but 
any four of the six states will serve the purpose equally well. 
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Figure 3.2: A comparison of three incomplete QPT schemes: the non-adaptive MLME 
scheme, the adaptive MLME scheme and the adaptive MPL-MLME scheme. Monte 
Carlo simulations are carried out on two different types of imperfect CNOT gates 
described in the text. Here, N = 10 4 and an average over 50 experiments is taken to 
compute the trace-class distances. For both the non-adaptive as well as the adaptive 
MLME schemes, the 16 linearly independent input states are chosen to be tensor 
products of projectors of the kets |0>, |1), (|0) + |1>)/a/2 and (|0) + |1) i)/\/2. For all 
schemes, the POM outcomes are chosen to be the tensor products of qubit SIC POMs. 
The tomographic performance of the adaptive MPL-MLME scheme is the best out 
of the three. The plots show that the tomographic efficiency can be further improved 
by optimizing the input states over the Hilbert space instead of restricting to a fixed 
set of linearly independent input states, albeit the small difference in tomographic 
performance between the two adaptive schemes for some quantum processes. 
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projected log-likelihood functional with respect to the next input state. A 
point of view to justify this conjecture is to interpret the maximum of the 
normalized log-likelihood functional log (£({ni m }; E)) /LN as the maximum 
information gain from the measurement data. When the number of copies 
N is infinite, the data are noiseless and the resulting maximum information 
gain is Ylim flm log (/zm)> which is the negative of the Shannon entropy of the 
measurement data. For finite N, the maximum information gain over the 
space of statistical operators will typically be lower than the true maximum 
due to the positivity constraint, especially when Et ine is highly rank-deficient. 
In this language, the MPL-MLME strategy attempts to maximize this max- 
imum information gain as much as possible via the optimization of future 
input states over the entire Hilbert space of statistical operators, using the 
normalized projected log- likelihood functional as an estimate for the actual 
normalized log-likelihood functional describing future measurements. This is 
a possible explanation for the optimal decrease in the likelihood plateau size 
since one has maximal knowledge about the unknown E truc gained with the 
optimized input states and so the ambiguity in the estimators is minimized. 

We illustrate this point by considering the imperfect cnot gate with e = 
0.1 described by Eq. (3.4.9). Since the boundary of the likelihood plateau is 

complicated, we shall estimate its size numerically by first generating Nq = 10 3 

* (i) 

ML estimators EyA labeled with the index j for a given set of measurement 
data. Next, in the same spirit as in numerical sampling, we can define the 
operator centroid 



for this generated set of estimators and the normalized Hilbert-Schmidt stan- 
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Figure 3.3: The dependence of the size of the likelihood plateau (A) and the nor- 
malized log-likelihood maximum on the number of input states. The respective per- 
formances of the non-adaptive MLME scheme, the adaptive MLME scheme and the 
adaptive MPL-MLME scheme are computed based on noiseless measurement data for 
an imperfect CNOT gate with e = 0.1. For both the non-adaptive MLME scheme and 
the adaptive MLME scheme, the 16 linearly independent input states are chosen to 
be tensor products of projectors ofthekets |0>, |1), (\0) + \l))/V2 and (|0) + 11> i)/\/2- 
For all schemes, the POM outcomes are chosen to be the tensor products of qubit 
SIC POMs. From the plot, the rate of decrease of A is the greatest with the adaptive 
MPL-MLME scheme. The increase in the normalized log-likelihood maxima with the 
adaptive MPL-MLME scheme may also be interpreted as greater maximum informa- 
tion gain after measurements using the optimal input states as compared to the other 
schemes. 
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dard deviation 

1 



£^itr^ML-^ML' 2 



A = — \ * 



(3.4.12) 



away from the centroid. Thus, < A < 1. For sufficiently large Nq, the 
size of the plateau may be well approximated by the spread A. Figure 3.3 
compares the respective performances of the the three proposed schemes by 
analyzing the size of the likelihood plateau and the maximum of the normal- 
ized log-likelihood functional. From Fig. 3.3, it is crucial to understand that 
A does not, strictly speaking, decrease monotonically with increasing height 
of the normalized log-likelihood functional. A counterexample is shown in the 
figure, that is a significant decrease in A for the adaptive MLME scheme as 
compared to the non-adaptive one with the corresponding slight decrease in 
its normalized log-likelihood maxima. We emphasize that what the adaptive 
MPL-MLME strategy exploits is the possible trend of this behavior. 

To end this part of the section, we comment that the aforementioned idea 
can be applied to adaptively choose the next set of POM outcomes IT,- based 
on the collected measurement data. However, to perform the optimization 
successfully requires the solutions to more technical problems which include 
ensuring that the POM outcomes are linearly independent after the optimiza- 
tion. This project is left for future studies. 

3.4.3 A combination of both adaptive strategies 

Let us begin this final part of the section by reviewing the nonconvex feature of 
the MPL-MLME strategy discussed in §3.4.2. The presence of multiple local- 
maxima estimators which are linearly independent is an important element 
of the MPL-MLME strategy as it provides linearly independent input states 
which are optimal for measurement based on the data obtained from the 
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experiments. In general, because of the nonlinearity of Eq. (3.4.7), it is difficult 
to determine the number of such linearly independent extremal solutions for 
a given set of measurement data by analytical means. One can only search 
for as many linearly independent local-maxima estimators pmpl as possible 
via numerical optimizations from different starting points within a reasonable 
time period. 




2 4 6 8 10 12 14 

Number of input states 



Figure 3.4: A comparison of three incomplete QPT schemes: the non-adaptive MLME 
scheme, the adaptive MLME scheme and a combination of the adaptive MPL-MLME 
scheme and the adaptive MLME scheme (hybrid scheme). Monte Carlo simulations 
are carried out on the imperfect CNOT gate with e = 0.1. Here, N — 10 4 and an 
average over 50 experiments is taken to compute the trace-class distances. For both 
the non-adaptive as well as the adaptive MLME schemes, the default set of 16 linearly 
independent input states are chosen to be tensor products of projectors of the kets 
|0), |1), (|0) + |l))/\/2 and (|0) + |l)i)/\/2. For all schemes, a set of 16 randomly 
generated positive operators, which are all linearly independent of one another, are 
used to form the POM. For this POM, the average repetition frequency of the adaptive 
MPL-MLME scheme is very high after four input states are used. The first input 
state for all schemes is chosen to be the same separable state = 1 00) (00|. For 
the third scheme, the second to the fourth input states (shaded region) are optimized 
using the adaptive MPL-MLME strategy and the subsequent input states are chosen 
via the adaptive MLME strategy using the default set of input states which excludes 
1 00) (00 1 . The plot shows that the overall performance of the combined strategy is 
better than the adaptive MLME strategy alone. 



Another technical subtlety is that these local-maxima estimators tend to 
repeat themselves during the optimization. Hence, a local-maxima estimator 
which was chosen as one of the input states earlier may reappear in later op- 
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timizations. The repetition frequency strongly depends on the POM chosen 
to measure the output states. The examples given thus far make use of the 
product tetrahedron measurements as the POM and the resulting MPL op- 
timizations give linearly independent estimators with few repetitions. This 
may not be the case for other types of POM. In view of this, another way of 
doing AMLME QPT is to use both adaptive strategies in §3.4.1 and §3.4.2 
interchangeably, the hybrid MLME strategy. For example, one can start with 
the adaptive MPL-MLME strategy for tomography and when the repetition 
rate increases as more input states have been used, one may switch to the 
first adaptive MLME strategy. Figure 3.4 suggests that such a hybrid MLME 
strategy can further improve the tomographic performance as compared with 
the adaptive MLME strategy alone. 

3.4.4 Fixed measurement resources 

Finally, we try to answer, with numerics, the following question: For a fixed 
value of LN, is it more beneficial, in terms of tomographic performance, to 
measure more input states with fewer copies per input state or to measure 
fewer input states with more copies per input state? In quantum state es- 
timation, it is well known that for a fixed number of measurement copies, 
it is better to measure more POM outcomes, an overcomplete set if possible 
[dBLDG08]. To see if there exists an analogous benefit to measure more input 
states in QPT, we performed a simulation with a fixed value of LN and show 
the results in Fig. 3.5. 

It turns out that the average trace-class distance is a monotonically de- 
creasing function of L, with the maximal L = Lml = 16. Hence, for a fixed 
amount of measurement resources, the advantage of increasing the different 
types of measurements carries over to quantum process estimation. However, 
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Figure 3.5: Numerical simulation on the imperfect two-qubit CNOT gate with random 
noise for fixed LN = 10 4 . An average over 50 experiments is taken to compute the 
trace-class distances. The adaptive MPL-MLME strategy is used when the number 
of input states L is less than 16. 



it is important to note that this does not contradict the fact that for a fixed 
average trace-class distance, one can use MLME to reduce the total number 
of measurement resources/settings by simply reducing the number of input 
states necessary to achieve this distance. This is because, as discussed previ- 
ously in §3.3 and also shown in Fig. 3.5, the improvement gained by increasing 
the number of input states L decreases rapidly with L, especially when the 
input states are chosen optimally. Put differently, it is not worth the trouble 
to increase L after some point, beyond which there is very little tomographic 
improvement. This point, which is the essence of AMLME QPT, cannot be 
overemphasized. Experimentally, this means that one need not perform full 
tomography to obtain a quantum process estimator within a certain preset 
error margin since other confounding variables contribute to the total experi- 
mental error anyway. 
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Chapter summary 

We have established adaptive numerical strategies to perform incomplete 
quantum process tomography. One may choose whichever strategy is con- 
venient to carry out tomography depending on the available types of mea- 
surement resources at hand. Each of these strategies combines the simplicity 
of incomplete quantum process tomography using quantum state estimation 
with good tomographic performances using optimization techniques. It can 
never be overemphasized that, although some prior information is necessary 
for each adaptive strategy, such information is never used in the estimation 
of the unknown quantum process. Rather, the prior information is utilized to 
adaptively select future input states, the input states in our context, based 
on the current measurement data, to optimize the tomographic performance. 
The discussions presented in this chapter, therefore, provide a means of obtain- 
ing estimators for the unknown quantum process using incomplete resources 
which are typically within reasonably good experimental precisions. These 
estimators are statistically meaningful in that they are least-biased with re- 
spect to a set of informationally incomplete measurement data and are hence 
suitable for partial characterization of quantum processes. This is in contrast 
with the standard quantum process tomography which generally requires a 
huge amount of informationally complete measurement resources. 
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Conclusion 



The frequentist's notion of quantum estimation serves as a very useful method- 
ology for estimating the identity of a given source of quantum systems or a 
quantum channel. In this dissertation, we have touched on several aspects 
of this theory. They involve the two main statistical principles of maximum- 
likelihood and maximum-entropy, both of which are celebrated approaches in 
the subject of classical parameter estimation. Numerical techniques were de- 
veloped to reconstruct quantum states and processes from the measurement 
data obtained. One important experimental application of these techniques, 
namely entanglement detection, was discussed in detail. Another important 
direction from the materials discussed in this dissertation would be to develop 
numerical methods for the construction of error bars that go with the recon- 
structed statistical or process operators. In view of this, we briefly mention 
that a methodology to construct what is called the region estimator for a given 
set of measurement data was discussed in a recent Workshop on Quantum To- 
mography (WQT@CQT 28 November - 02 December 2011). This estimator 
is a region of statistical operators that encloses the true state/process with 
a high probability, based on a pre-chosen likelihood ratio. Further improve- 
ments of this methodology with incomplete measurement data is a subject of 
future work. 
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Dual Superkets of the SIC POM 



The superkets of a D-dimensional SIC POM follows the trace relation 

These D 2 superkets are therefore not orthonormal to one another. To facil- 
itate the subsequent calculations, it is convenient to construct a set of D 2 
orthonormal superkets, denoted by |nj-)), out of the |llj))s. To do this, we 
use the following ansatz: 

^ = ^ + ^(wTmrm)' (A - 2) 

where ((nj-|nj-)) = 1. The inner product 

„ ± , Mv _ Da% k + a 2 + P 2 D\D + 1) + 2af3D(D + 1) 
^■ |Ufc //" " (D + l)(a 2 + f3 2 D 3 + 2af3D) " ^ 

suggests that a 2 + (3 2 D 3 (D + 1) + 2a(3D{D + 1) = for ((n||n^)) = S jk . This 
equation allows for a free variable a or (3. Choosing j3 = 1/D, we find that 
a = \/D + 1 — D — 1 . Hence a good choice of orthonormal superkets are 

|n|» = 1 J// v ; ; 1 /; ^ - . (A.4) 

' 3 " VD + 2- 2VDTT 
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Using these orthonormal superkets and after a tedious simplification, we ob- 
tain the matrix elements of T = J2i I Hz) fib | t° be 



J- = (n+ = ——^ r<5ifc+^T7— — r- ( A - 5 ) 

Jk x J 1 1 k// D(D + 1) 3 D 2 (D + 1) v ' 



This means that 

The form of the superoperator T is that of T = dl + bnP, where V = V 2 is 
a rank-1 projector. To invert this superoperator, we note that 

T = al + bnV 

= a(l-T) + (a + bK)V. (A.7) 

Since X — V and P are orthogonal projectors, the inverse of T is given by 

F l = -il-V) + — l — V 
a a + ok 



l x ^ 

a a 2 + a6« 



X- 2 | — P. (A.8) 



Using the parameters a = 1/D(D + 1), b = l/D 2 {D + 1), k = D 2 and 
we obtain 

J- 1 = £>(£> + 1)1- K»« n ^l ■ (A. 10) 

i v 

Thus, 

|©i» = -^In,}} = |n,»£p + 1) - |i» , (A.ii) 
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where 



e j \U k ))=\D(D + l)(U j 



1 



D{D + 1) 



DS jk + 1 
D 2 (D + l) 



|n fe » 
i 

" D ~ 



Ojk 



(A.12) 



as it should be. 
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Appendix B 

Wigner Functional in Fock 
Representation 



With the help of the relation between the Fock-state wave functions (x\n) and 
the Hermite polynomials H n (x) given by 

W " > = ^W^ /2 "" (I) ' (R1) 

the one-dimensional Wigner functional, defined as 

W(x, p) = 2 J dy e 2ipy (x-y\p\x + y) (B.2) 

for the dimensionless values x and p, for a given statistical operator p can be 
represented in the Fock basis as 



OO OO „ 

W(x,p) = 2 E E / d2/e 2ipy (x - y\m) (m\ p \n)(n\x + y) 

, n , nJ S v ' 

— Pmn 

oo oo . 

/^"^ *»<*-»<>*•<*+«'> 



7T 

m=0 n— 

r 2_ T1 2 OO OO 



— 3^ — ( - x -> 'JO / -i \fH /» 

2 E E Li == / V e^' 2 iW - a) tf n (y' + b) , 



where a = x — ip and 6 = x + ip. 
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In obtaining the final equation, a new variable y' = y — \p is introduced 
and the property H n (—x) = (—l) n H n (x) is used. The job is thus to evaluate 
the integral of the general form 

l(a, b; m,n) = J dy' e~ y ' 2 H m (y' - a) H n {y' + b) . (B.3) 

We shall first consider the case where m> n. To proceed, it is useful for 
us to understand the response of H n {x) when the argument is shifted by a 
constant a. We begin with the generating function 

e— 2 = J> n (^ (B.4) 



n=0 



for the Hermite polynomials. 
It follows that 



+n 00 + k 



J2H n (x + a)- = Y,H k (x)e 2 fc , 

(2 a y p +k 



n=0 k=0 

00 00 



j=0 k=0 J 



00 00 



(« ./ • > EE H^ {2(l) "~ k '" 



k\(n-k)\ 

k=0n=k v ; 



so that 



n=0 k=0 

00 n n ✓ 

= E^Eu)^)( 2a ) B -*. ( B - 5 ) 

n=0 ' fc=0 V y 

+ a) = E Q (2a) n ~ fc • (B.6) 
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Using Eq. (B.6), the integral in Eq. (B.3) turns into 
l(a,b;m > n) = (7) UK-W^W 



j=0 k=0 



V v ' 

y/n2 k fc! <5jfe (Orthogonality relation) 

fe! (-2ab) T 



A2 m (-a) m " n E (?) 

fc=o ^ ' 

! — n V / 



m\ (-2ab) r 



fc=0 

n / \ / ^ i \ n— k 



k J (n — k)\ 

U = n-k)^ = V* 2 m n! (-a)— V f " + ( ™ " ^ t^l . (B .7) 



j=0 

From the definition 

where Ln\y) are the associated Laguerre polynomials in y of degree n and 
order z/, we have 

l(a, b;m>n) = ^2 m n\ (-a) m " n 4 m ~ n ) (2ab) . (B.9) 

The corresponding expression for m < n requires the roles of m and n, as well 
as those of —a and b, to be interchanged. Thus 

l{a,b;m< n) = ^2 n ml b n ~ m L^" m )(2a6) . (B.10) 

With Eqs. (B.9) and (B.10), we can write 

oo oo 

W(x,p) = 2e~ x2 ' p2 £ ^PmnMnnfap) , (B.ll) 



m=0 n=0 
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where 



(-^"^S^ - ^) m " n Lt~ n \2x 2 + 2p 2 ) if m > 



i- 1 ) m \f¥^.( x + [ PT~ m Lt~ m) {2x 2 + 2p 2 ) if m < 



■M mn (z,p) = < 



or, with n< = min{m, n} and n> = maxjm, n}, 



n 



n 



(B.12) 



Atmnfop) = (-l) n< J ^-^(x + i s ^ n -^p) n >- n < L^>~ n <\2x 2 + 2p 



2 )- 
(B.13) 



Appendix C 

Formula for Computing the 
Non-classicality Depth 



From the definitions of the function 



^(«; r ) = ~/ ( dw ) ex P w| - j p(w) (c.i) 

and the Glauber-Sudarshan P function [Meh67] 

P( w ) = — (du) (-u*\p\u)eM 2 e wu *- w * u (C.2) 

with the overcomplete set of coherent states \u), similar manipulation in Ap- 
pendix B gives 



7r(l-r) n n V ml n\ 

m=0 n=0 



{du)u* m u n e'^-r[\ u \ 2 ^*n- au *)/V2} _ (c>3) 



By defining z = |z|exp(i#) = a/y2, the necessary integral for subsequent 
calculations from Eq. (C.3) is given by 

l(z,r;m,n)= (du)u* m u n e- — e" (z u ~ zu K (C.4) 
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By introducing the polar coordinates (du) = sdsdtp, we have 



l(z,r;m,n) = I dss m+n+1 e-^ d(pe -if(m-n) e -2i\z\s S m( V -9)/(l-r) 



JO 



=2we-Mm-n)J m _ n (-^h^ 

where the second equality is obtained via the integral definition 

Mv) = 7rl d(pe -Kw-y^<p) (c.6) 
^ Jo 

of the Bessel function (Friedrich Wilhelm Bessel) of the first kind Jfi(y) of 
integer order //. Using a new set of variables t = s 2 t/(1 — r) and supposing 
that m > n, 

(1 \ m+1 / I I \ Ta—n 

~r) ( 



1 -r 



In deriving the identity above, we make use of the fact that J^(— y) 
(— 1) M Jfi(y), which follows immediately from the generating function 



e KH)= ]T ^(y)^ (c . 



fl= — 0O 



where i is complex. To evaluate the integral in Eq. (C.7), we need a few 
identities for J^(y). Let us start by establishing the power series expansion 
for Jfi{y) with an integer order \i. For this, we need the expression for the fcth 
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derivative of Jaiy) with respect to y. From Eq. (C.6), 



dy 



My) 



y=0 



2tt 



2tt 



d(p (isin</j) e 



k P -i(w-y sin v) 



•A; /-27T 

2^ 



d</? (sin 99) e 



(C.9) 



Using the parametrization q = e l<p , 



dy) "» 



My) 



2n 



dip 



j<p _ -up 



dq {q-q 



2tt 



1 1 



2 k 2n J unit \q 
circle 

circle 



2i 



-on 



(g 2 ~ l) fc 



(CIO) 



The resulting contour integral can be evaluated using the Cauchy's Residue 
Theorem (Baron Augustin-Louis Cauchy), from which we have 



27Ti .funit ^ q k -r!<+> V r /-r/< + -l 

circle 



(C.ll) 



Since the pole q = of the complex function in the argument is of order 
k + fj, + 1, the corresponding residue can be calculated using the formula 



Res 



(<? 2 - i) A 







1 



d \ 



(k + n)\ \dqj 



k+fj, 



if - iy 



(C.12) 



9=0 
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Since 



(i 2 - 1)* 

1=0 v 7 



q=0 



q=0 



we have 



(_l)^(fc + M )! 



s (I) 



{2l-k- n)\ 
k\ 



2l-k-fi 



q=0 



1 



•1) 2 



Thus, the Maclaurin series (Colin Maclaurin) of J^{y) is given by 



00 / H \ k 



y=0 



fc! 



k=0 

oo 
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(C.13) 



(C.14) 



(C.15) 



where we note that [(A; — fx)/2]\ = oo when k < [i. After a change of variable 
k — > 2k + fx, we finally obtain the power series expansion 



fc=0 



k\(k + n)\ \2 



(C.16) 



which is very useful to obtain the necessary identities to proceed. We note 
that this formula is valid for any real number although we have derived 
it from Eq. (C.6) for integer fx. Since we are still considering the case where 
m > n, [i = m — n > 0. 
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relates the y-derivative of y^J^iy) to another Bessel function J^_i(y) that is 
one order lower. Next, 
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With these two identities, and the definition of the associated Laguerre 
polynomials in Appendix B, we have 
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Thus, using the integral representation in Eq. (C.19) for v = m — n > 0, 
we have 



l(z,T;m > n) = (-l) m -™7rn! (z*) m " n e"^^) 

\ m+l / 1 \ rn—n / \ \2 \ 



For the case where m < n, we make use of the property 

J^(y) = (-l)%(y) (C.21) 

and evaluate the integral in Eq. (C.5) using, again, Eq. (C.19), from which we 
obtain 

l(z,T;m< n) = (-l) m ~ n yr ml z"^^^ 

(-1 \ n+l / 1 \ n—m / I 1 2 \ 

^) (rb) 4"-) ( J^) ■ (c.22, 

Finally, using the results in Eq. (C.20) and (C.22), we have 
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(C.23) 

where n< = min{m, n} and n> = max{m, n}. 



Appendix D 

Uniqueness of the Hedged 
Likelihood Estimator 



We suppose that there exist two estimators p\ and pi that maximize the 
hedged likelihood functional £h({%}; p)- The concavity of Cn({rij}; p), which 
is equivalently expressed by the inequality 



£ H ({n;}; Api + (1 - A)p 2 ) > A£ H (K}; Pi) + (1 - A)£ H ({n i }; p 2 ) (D.l) 



for < A < 1, implies that the convex sum pi = Xpi + (1 — A)/5 2 also max- 
imizes £}i({nj}; p). In other words, if we vary the parameter A along the 
direction from p\ to p% and vice versa, the gradient of Cu({rij}; p) will al- 
ways be zero. Hence, making use of Eq. (1.5.3), we obtain two simultaneous 
equations inasmuch as 





(D.2) 



Adding the two equations and dividing the sum by 2, we have 



-ti{ Pl p 2 1 + p 2 p x 1 } + — tr|Aip2 + R2P1 




(D.3) 



Since both p\ and p2 and their corresponding inverses are full-rank, each 
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product of operators in the first trace term is also full-rank. Defining M = 
f>if>2 1 > we can express the first term in the eigenvalues of the full-rank 
operator M, i.e. 




>i 



>/3^ = /3 J D. (D.4) 

k 

For the second term, denoting pkj = tv{pi~Ilj}, a similar argument follows 
[RH04], namely 

-tr(i* lP2 + i? 2 pi| = T £ (^,* + — Pi,*J 

=N ^ f (plk±%A 

y 1p\,kV2,k J 

S v ' 

>1 

>iV^/ fc = iV. (D.5) 
fe 

Therefore the left-hand side of Eq. (D.3) is always larger than the right-hand 
side unless of course = 1 in the first term, which leads to pij = p2j needed 
for the equality in the second term. It follows that the operator M is the 
identity operator. This means that pip^ 1 = P2P\ l = 1 and so pi = p2, which 
concludes the proof. 
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